Skip to main content

Table 1 cDNA features flow, generation of new features and type of test performed by CAFTAN

From: CAFTAN: a tool for fast mapping, and quality assessment of cDNAs

Type

Tests

Results>

CAFTAN input

Queries

Derived features

Mapping

CDNA coverage

True if more than the 20% of the sequence is mapped to the genome

   
 

3'utr mapping

True if delta3 < 50 bp. delta3 is the difference between the length of the cDNA and the last position mapped in the 3' end of the cDNA without poly (A) tail

   
 

5' utr mapping

True if the cDNA start mapping position in the genome is < 28 bp

 

Filtered BLAT

 
 

Internal mapping

True if deltaint < 50 bp. Delta int is the number of mismatches in the cDNA exons taking into account the cDNA length

 

Exons

Delta5' region

Delta3' region

Mapping

Total mapping

Unmapped: when 3'utr mapping, 5'utr mapping, and Internal mapping are False

 

Exons length

Delta int

  

Mapped: when 3'utr mapping, 5'utr mapping, and Internal mapping are True

   
  

Partial mapped: only two or one of the mapping test are false

   

CDNA Structure

Single exon cDNA

Multiple exon cDNA

True if exon number = 1

True if exon number > 1

BLAT Output

Filtered BLAT

Exons

 

Repeats (R) calculated for: 5' upstream, 3' downstream, Last exon

Repeats number

Complex R

Repeat type

Number

True if any of the following repeats is present: SVA R, Alu R, L1 R, LTR R, or ScRNA

Returns a string, the type of repeat in the 3 prima region depending of how many repeats in this region were found and how much can they influence the right cloning of the cDNA: Complex repeats > Simple repeats > Low complexity repeats

 

Genome assembly query

Repeats overlapping with the given regions

Simple R

Alu R

L1 R

SVA R

LTR R

ScRNA R

Low complexity R

Splice Sites (Ss)

Number of Ss

cDNA Ss type

Ss-score

Number

Returns a string with the type depending on the Ss types in the cDNA

Unknown: at least one Unknown splice sites

Antisense: at least one antisense Ss, no Unknown

U12: at least one U12 Ss, no antisense and no unknown Ss

Non_canonical: only non canonical and cannonical Ss

Canonical: All splice sites are cannonical

Percent. Returns the % of good splice sites in a multi exon cDNA. Good splice sites are canonical, non canonical and u12 splice sites

 

Genome assembly query

Canonical Ss "GT-AG"

Non_canonical Ss "GC-AG"

U12 Ss "AT-AC"

Antisense Ss "CT-GC","GT-AT"

Unknown Ss (others)

  

Returns a string with the type depending on the Ss types in the cDNA

   
 

cDNA Ss type

Number

Unknown: at least one Unknown splice sites

  

Canonical Ss "GT-AG"

  

Antisense: at least one antisense Ss, no Unknown

  

Non_canonical Ss "GC-AG"

  

U12: at least one U12 Ss, no antisense and no unknown Ss

 

Genome assembly query

U12 Ss "AT-AC"

  

Non_canonical: only non canonical and cannonical Ss

  

Antisense Ss "CT-GC"

"GT-AT"

 

Ss-score

Canonical: All splice sites are cannonical

   
  

Percent. Returns the % of good splice sites in a multi exon cDNA. Good splice sites are canonical, non canonical and u12 splice sites

  

Unknown Ss (others)

PolyA signal and tail

cDNA signal type

Poly A tail

Returns a string with the type depending on:

Canonical: There is at least a canonical signal in the cDNA

Non-canonical a: There is no canonical signal and there is a non-canonical a onel

Non-canonical b: There is none of the above signals but there is a non-canonical b one

Non-canonical c: There is none of the above signals but there is a non-canonical c one

True if there is a poly (A) tail in the cDNA

Polyasignal Output

 

Signals:

Canonical A [TA]TAAA

Non-canonical a [^A]ATAAA

Non-canonical b AATA[^A]A

Non-canonical c A[CG]TAAA

Tail length

Contamination

Contamination

Returns a string with the type of contaminations

PolyA C: if there is more than 80% As in a 20 bp window in the genome. 5 last bp from the last exon + 15 bp after.

RepeatC: Complex repeats contamination in the last exon or 3' end

Mixed contamination: both contaminations

No Contamination: no polyA C and no RepeatC

Polyasignal Output

Genome assembly query

Genomic polyA tail

Presence of complex Repeats