Skip to main content

Table 3 Characterization of 11,015 mismatched sequence segments in primate sequences, according to nine different features

From: Understanding the causes of errors in eukaryotic protein-coding gene prediction: a case study of primate proteomes

Class Feature No. (%) of errors
Evidence of gene prediction error Genomic sequence contains N characters (introns or exons) 5256 (47.7%)
Primate sequence contains short introns (< 30 nucleotides) 937 (8.5%)
1 Human exon aligned with ≥ 3 primate exons 611 (5.5%)
Non-canonical splice sites in human sequence 237 (2.2%)
Frameshift in primate exon sequence 138 (1.3%)
Evidence of false positive error Human isoform exists that matches primate sequence 1194 (10.8%)
Multiple alignment error 244 (2.2%)
In a repeated protein region 232 (2.1%)
Mixed evidence Mismatch associated with evidence of both gene prediction error and false positive error 341 (3.1%)
Unconfirmed Conserved in ≥ 4 primates 1054 (9.6%)
  Mismatch associated with evidence of gene prediction error only 5446 (49.4%)
  Mismatch associated with evidence of false positive error only 4174 (37.9%)
  Mismatch associated with at least 1 feature 7401 (67.2%)
  Mismatch associated with 0 features 3614 (32.8%)