Skip to main content

Table 3 Characterization of 11,015 mismatched sequence segments in primate sequences, according to nine different features

From: Understanding the causes of errors in eukaryotic protein-coding gene prediction: a case study of primate proteomes

Class

Feature

No. (%) of errors

Evidence of gene prediction error

Genomic sequence contains N characters (introns or exons)

5256 (47.7%)

Primate sequence contains short introns (< 30 nucleotides)

937 (8.5%)

1 Human exon aligned with ≥ 3 primate exons

611 (5.5%)

Non-canonical splice sites in human sequence

237 (2.2%)

Frameshift in primate exon sequence

138 (1.3%)

Evidence of false positive error

Human isoform exists that matches primate sequence

1194 (10.8%)

Multiple alignment error

244 (2.2%)

In a repeated protein region

232 (2.1%)

Mixed evidence

Mismatch associated with evidence of both gene prediction error and false positive error

341 (3.1%)

Unconfirmed

Conserved in ≥ 4 primates

1054 (9.6%)

 

Mismatch associated with evidence of gene prediction error only

5446 (49.4%)

 

Mismatch associated with evidence of false positive error only

4174 (37.9%)

 

Mismatch associated with at least 1 feature

7401 (67.2%)

 

Mismatch associated with 0 features

3614 (32.8%)