Skip to main content
Figure 4 | BMC Bioinformatics

Figure 4

From: Prediction of RNA-binding amino acids from protein and RNA sequences

Figure 4

Comparison of the sequence similarity-based method and the feature vector-based method for reducing data redundancy. The sequence similarity-based method removes an entire sequence that is identical or similar to other sequences. When similar sequences are eliminated from a dataset, their binding information is also lost. When the remaining sequence contains repetitive subsequences, redundant data are generated from the subsequences. The feature vector-based method first represents every possible subsequence and its binding information as a feature vector. A subsequence is removed only when it has the same feature vector as others. Subsequences with the same amino acid sequence but different binding information are considered different and both are kept in the training dataset.

Back to article page