Skip to main content

Table 1 Detailed description of the real data, from the Ensembl-Comapara database, used for the evaluation

From: SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups

  Family
  FAM86 MAG TP53
Species Gene_ID # Gene_ID # Gene_ID #
Human ENSG00000158483 3 ENSG00000105492 6 ENSG00000141510 15
  ENSG00000186523 4 ENSG00000142512 7 ENSG00000073282 11
  ENSG00000145002 2 ENSG00000105695 4 ENSG00000078900 9
Chimp. ENSPTRG00000007738 1 ENSPTRG00000011374 1 ENSPTRG00000008703 1
Mouse ENSMUSG00000022544 1 ENSMUSG00000051504 4 ENSMUSG00000022510 8
Rat ENSRNOG00000002876 1 ENSRNOG00000021023 2 ENSRNOG00000010756 4
Cow ENSBTAG00000008222 1 ENSBTAG00000017044 1 ENSBTAG00000001069 1
Chiken ENSGALG00000002044 1    ENSGALG00000007324 2
Lizard    ENSACAG00000005408 1   
Total 8 genes 14 8 genes 26 8 genes 51
Avg. CDS length 726 1397.42 1277.41
Avg. gene length 22782.37 26049.75 109457.75
Avg. pairwise PID 56.57 41.41 57.21
  1. For each gene family, the following information are given: the species name, the Ensembl identifier of gene, the number of CDS for each gene (#), the average CDS length, the average gene length, and the average pairwise Percent Sequence Identity (PID). The average pairwise PID were computed based on pairwise alignments of the CDS obtained from the multiple alignments of their proteins families provided by Ensembl-Comapara