Skip to main content

Table 1 Detailed description of the real data, from the Ensembl-Comapara database, used for the evaluation

From: SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups

 

Family

 

FAM86

MAG

TP53

Species

Gene_ID

#

Gene_ID

#

Gene_ID

#

Human

ENSG00000158483

3

ENSG00000105492

6

ENSG00000141510

15

 

ENSG00000186523

4

ENSG00000142512

7

ENSG00000073282

11

 

ENSG00000145002

2

ENSG00000105695

4

ENSG00000078900

9

Chimp.

ENSPTRG00000007738

1

ENSPTRG00000011374

1

ENSPTRG00000008703

1

Mouse

ENSMUSG00000022544

1

ENSMUSG00000051504

4

ENSMUSG00000022510

8

Rat

ENSRNOG00000002876

1

ENSRNOG00000021023

2

ENSRNOG00000010756

4

Cow

ENSBTAG00000008222

1

ENSBTAG00000017044

1

ENSBTAG00000001069

1

Chiken

ENSGALG00000002044

1

  

ENSGALG00000007324

2

Lizard

  

ENSACAG00000005408

1

  

Total

8 genes

14

8 genes

26

8 genes

51

Avg. CDS length

726

1397.42

1277.41

Avg. gene length

22782.37

26049.75

109457.75

Avg. pairwise PID

56.57

41.41

57.21

  1. For each gene family, the following information are given: the species name, the Ensembl identifier of gene, the number of CDS for each gene (#), the average CDS length, the average gene length, and the average pairwise Percent Sequence Identity (PID). The average pairwise PID were computed based on pairwise alignments of the CDS obtained from the multiple alignments of their proteins families provided by Ensembl-Comapara