Integrating alternative splicing detection into gene prediction
© Foissac and Schiex; licensee BioMed Central Ltd. 2005
Received: 27 July 2004
Accepted: 10 February 2005
Published: 10 February 2005
Alternative splicing (AS) is now considered as a major actor in transcriptome/proteome diversity and it cannot be neglected in the annotation process of a new genome. Despite considerable progresses in term of accuracy in computational gene prediction, the ability to reliably predict AS variants when there is local experimental evidence of it remains an open challenge for gene finders.
We have used a new integrative approach that allows to incorporate AS detection into ab initio gene prediction. This method relies on the analysis of genomically aligned transcript sequences (ESTs and/or cDNAs), and has been implemented in the dynamic programming algorithm of the graph-based gene finder EuGÈNE. Given a genomic sequence and a set of aligned transcripts, this new version identifies the set of transcripts carrying evidence of alternative splicing events, and provides, in addition to the classical optimal gene prediction, alternative optimal predictions (among those which are consistent with the AS events detected). This allows for multiple annotations of a single gene in a way such that each predicted variant is supported by a transcript evidence (but not necessarily with a full-length coverage).
This automatic combination of experimental data analysis and ab initio gene finding offers an ideal integration of alternatively spliced gene prediction inside a single annotation pipeline.
Alternative splicing (AS) is a biological process that occurs during the maturation step of a pre-mRNA, allowing the production of different mature mRNA variants from a unique transcription unit. AS is known to play a key role in the regulation of gene expression and transcriptome/proteome diversity . First considered as an exceptional event, AS is now thought to involve the majority of the human multi-exon genes, from 50% to 74% [1–3]. This observation raises new issues for genome annotation, especially concerning the computational gene finding process that generally provides only one exon-intron structure per sequence.
In the context of structural gene prediction, two classes of approaches are usually considered. In the first approach, usually denoted as intrinsic or ab initio, the only type of information used for gene prediction lies in the statistical properties of the various gene elements (exons, splice sites and other biological signals). On the contrary, so-called extrinsic approaches essentially rely on the existence of similarities between the sequence to annotate and other known sequences (either proteins, transcripts or other genomic sequences). Several existing gene finding tools are essentially intrinsic (or ab initio): this is the case for Genscan , HMMgene  or SLAM . For such a gene finder, the predicted gene structure is defined as an optimal prediction, that is the most probable according to its underlying probabilistic model. In the presence of AS however, a unique prediction is not sufficient. One obvious possibility is to look for suboptimal predictions. This can be done for a classic HMM-based gene finder by a modification of the Viterbi algorithm, thus providing the set of the k best predictions. This approach has been applied eg. in HMMgene or in FGENES-M (unpub.). Another way to obtain suboptimal solutions from a HMM is to do HMM sampling . This method, which consists in randomly generating parses according to the posterior probabilities, has been implemented in the gene finder SLAM. Usually, a very large amount of samples are needed to generate just a single prediction that differs from the optimal one. Genscan adopt a different approach and search for alternative exons not represented in the optimal prediction. This is done using a forward-backward algorithm to identify potential exons for which the a posteriori likelihood is larger than a given threshold.
In addition to the fact that all these exclusively intrinsic approaches cannot take into account transcript evidences, they suffer from two major problems of sensibility and specificity:
First of all, these methods assume that predictions representing AS variants should have a probability which is very close to the optimal probability according to the underlying gene model. This is however quite arguable, especially when the alternative structure significantly differs from the optimal one. Actually, when an AS variant eg. shifts from a strong to a weak or a non-consensus splice site or shows a complete coding exon skipping event, it is quite unlikely that the probability will remain in the neighborhood of the optimum since it will not be able to incorporate the corresponding splicing or coding score.
Moreover, a strong specificity problem has been observed for this approach. Since a very large number of alternative predictions can always be produced for any sequence, it is essential to be able to distinguish those reflecting real AS variants from in silico false positives. To perform this, and as long as AS sites dedicated prediction tools are unavailable, the probability of a prediction alone cannot be sufficient and additional evidence is required.
In opposition to the purely intrinsic approach, the analysis of experimental data can provide useful information. More specifically, sequences of mature transcripts resulting from AS provide reliable evidence of the existence of the AS event. Large scale studies have already been undertaken to detect AS evidences from transcript alignments and to collect them in databases such as eg. HASDB , ASDB , ASAP , ASD , EASED  or ProSplicer . Some software tools have also been designed to perform and/or exploit transcript alignment with the aim of identifying alternative gene structures. Such extrinsic annotation tools include GeneSeqer , ASPic , TAP [16, 17], and PASA . Except for GeneSeqer which is more focused on performing spliced alignment, the three other software adopt the same strategy: using genomically aligned transcripts, the aim is to determine the exon-intron structure(s) compatible with the greatest number of transcripts. Another approach, Cluster Merge , has been recently used in the Ensembl annotation system  to identify minimal sets of transcript variants compatible with genomically aligned ESTs evidences.
Unlike intrinsic methods, extrinsic approaches take advantage of transcript information. However, they also suffer from some limitations : first they entirely depend on the availability of transcribed sequences which bounds their sensitivity. With little exceptions (like TAP that exploits genomic sequence properties to identify gene boundaries, including eg. a polyA site scanning step, or GeneSeqer, that contains an intrinsic splice sites scoring method), they cannot predict a splice site if it is not represented in a transcript-to-genome alignment and therefore require a total coverage of each gene with all exon-intron boundaries. This can be problematic considering the ESTs fragmented nature. Moreover, when such methods can take advantage of a total gene coverage, the CDS localization remains to be done and the pure transcript predictions may not respect elementary coding gene properties (such as the presence of an ORF w.r.t. a given frame). Furthermore, overlapping transcripts are sometimes assumed to come from the same mature mRNA and are therefore merged. This may lead to the fusion of two overlapping transcripts coming from exclusive inconsistent mRNA variants, thus forcing the prediction to respect a chimeric virtual assembly.
Finally, and because experimental transcripts cannot exist for every existing gene, both intrinsic and extrinsic information are needed inside an annotation pipeline . The predictions provided by two different approaches can be different and even inconsistent, and merging them together requires a careful inspection of human curators, as performed in . A fully integrative method alleviates all these problems. GrailEXP  seems to be the only gene finder that tried to go in this direction. However, it can only consider AS events leading to complete exon inclusion/retention, ignoring thus approximatively half of the AS cases [8, 18]. The underlying approach remains unpublished.
To extend the domain of application of gene prediction to alternatively spliced gene structure prediction, we have designed an intrinsic/extrinsic integrative annotation method with the following aims:
For a given genomic sequence, an optimal gene structure prediction is produced, as usual.
In addition to this optimal prediction, for every transcript sequence providing evidence of AS, an optimal prediction consistent with this splicing form is also provided.
Each additional or alternative gene structure prediction has to be supported by some biological evidence.
Full-length transcript coverage is not required for a complete gene structure identification.
Each prediction satisfies the usual constraints on gene structure. A correct proteic coding gene is defined by a succession of one or more exons separated by introns flanked by splice sites. It contains a CDS between a start and a stop codon, and no in-frame stop in coding exons.
Our aim is to combine the advantages of the intrinsic and extrinsic approaches in an integrative system allowing for AS detection based on the analysis of genomic aligned transcript sequences. The method has been implemented inside EuGÈNE-M, a new version of the Arabidopsis thaliana EuGÈNE gene finder [22, 23], and applied to a reference genes set.
Given only the genomically ESTs alignments, we applied EuGÈNE-M on the genomic sequence containing the spl7 gene. Since two ESTs (T04465 and AI995153) show incompatible alignments (see Methods), EuGÈNE-M computes two additional predictions, each being consistent with one of them. The first alternative prediction is the same as the optimal one and corresponds to one variant; the second corresponds to the other variant.
Analysis of the AS cases detected by EuGène-M in the AraSet genes data set. First, sequence IDs, genes and EST involved are reported. The TIGR and AtGDB columns indicate if each AS case is reported in these databases. The AS status is described as follows: ACC = alternative acceptor splice site, DON = alternative donor splice site, -EX = exon skipping (an entire exon lacks in the reported variant), +IN = intron inclusion (an internal part of an exon is spliced), FP = false positive AS. nt = nucleotide. Some ESTs of the At2g39780 gene in seq16 are not correctly aligned: the use of either GeneSeqer or sim4 with default options leads to a missed 4 nt exon (not involved in AS). In seq50, the 168 nt additional (+IN) intron from the EST CF652136 is flanked by GC-CT (instead of the canonical GT-AG). In seq62, the EST AV542276 from the gene At4g37040 overlaps with an intron of EST AV562725 from the neighboring gene At4g37050. In seq65, the EST BE521212 is not spliced between the exon 5 and 6 of the gene At2g44100 (intron retention case), and is thus suspected of incomplete maturation. Except for CF652136, all alignments can be browsed on the AtGDB site.
skip 3 nt
non consensus splice sites
add 27 nt
add 33 nt
skip 33 nt
skip 105 nt
In the recent assessment of GeneSeqer on this AraSet data set, only three AS cases were reported . However, the authors only reported AS cases that were detected in GeneSeqer high-quality alignments and producing introns differing from the AraSet annotated introns. We therefore verified that our alignments were consistent with the GeneSeqer assessment alignment data available in the Arabidopsis thaliana Genome Database AtGDB [27, 28]. We noticed an alignment difference for only one of our alternative EST (CF652136), not present in the AtGDB because of its dbEST entry date (Oct. 2003). We also checked if the AS variants predicted by EuGÈNE-M were already reported in the AS sections of the AtGDB [26, 29] and of the TIGRdb [18, 30]. Only 3 of our detected AS predictions were already reported in both databases, and 3 were missing in all of them (Table 1), confirming that this methodology can help to automatically discover new potential AS cases, even on a well studied dataset.
The analysis of these AS cases confirms that AS seems to be much less frequent in A. thaliana than in Homo sapiens. Nevertheless, this AS ratio estimation is expected to increase in the future with the growth of transcript data availability. Another interesting point is the nature of the variants: on this gene set, the majority of AS cases involves a simple acceptor or donor alternative splice site. Notice however that since EuGÈNE-M's underlying model allows arbitrary alternative gene structure to be predicted, it is not limited to the prediction of such simple AS events and can perfectly cope with complex AS events, as found in mammals. This methodology can also be integrated in other existing gene finders where the score of a gene structure is defined as the sum of elementary scores of the signals and nucleotides involved in the gene structure (this includes HMM-based gene finders).
In this paper we have presented a new method to deal with alternative splicing in annotation and gene prediction. This integrative approach combines the advantages of an intrinsic and an extrinsic process to incorporate AS detection into ab initio gene finding. We showed that this method allows the discovery of new alternative spliced genes, with the reliability of extrinsic annotation and the potential exhaustiveness of ab initio gene prediction.
The process that goes from the original genomic sequence and associated aligned transcripts to the AS prediction is composed of three steps which we rapidly describe here :
first, the set of genomically aligned transcripts is analysed to detect AS evidences on the basis of splicing inconsistency between transcripts variants.
Then, the graph-model used in EuGÈNE to model potential gene structures is modified to take into account these aligned transcripts. For each transcript variant, the graph used in EuGÈNE for gene structure prediction is connected to an additional parallel graph subunit where local constraints are injected according to the exon-intron information provided by the corresponding transcript alignment.
Finally, an extended version of the dynamic programming algorithm used for obtaining an optimal prediction allows to identify, for each graph subunit, the best prediction consistent with the corresponding transcript alignment.
Detection of AS evidences from transcripts analysis
Since EuGÈNE already exploits transcripts information to improve the gene prediction process , the AS prediction only requires to consider transcripts providing evidence of AS. With this purpose, we focus on inconsistencies between transcript alignments.
Transcript sequences are first aligned against the genomic sequence using a spliced alignment tool. The choice of the source transcript database and the alignment tool is not a priori imposed by the method. Transcript sequences in our analysis were extracted from the A. thaliana section of dbEST  (release Dec. 2003: 190, 708 entries), and aligned in two steps. For the first step we used sim4 , a fast software that can deal with huge EST datasets. In the second step, we used GeneSeqer , usually more accurate on splice junction identification, to realign all transcripts aligned by sim4 that passed the following filtering process.
A first filtering step is performed on the basis of the transcript sequence and alignment quality. To be considered, an alignment has to satisfy some constraints defined by filtering parameters. For Arabidopsis thaliana, default parameters values are set as following: transcript length between 30 and 10000 bp, minimum alignment length = 95% of the transcript length, minimum identity score of 97%, maximum gap length of 5000 bp, maximum match length of 4000 bp. By default, and to avoid genomic contamination, unspliced transcripts are removed from the analysis. Moreover, because of the frequent weak alignment quality at the terminal regions, alignments extremities are shortened (by 15 bp by default).
The gene-finder EuGène
EuGÈNE is a gene finding software based on a directed acyclic graph gene model . For each nucleotide of the genomic sequence, every possible annotation of this nucleotide is represented in the graph. The graph is designed to model the whole prediction space: all consistent gene structures can be represented by a path through the graph, whose weight is defined as the sum of its edges weights. The minimum weight path defines the optimal prediction. Several sources of evidence are used to weight the edges of the graph and a shortest-path dynamic programming algorithm (linear in time and space) scans the graph to provide an optimal path which represents the best gene prediction according to available evidences.
Structure of the initial graph
Weighting the graph
The weight of a path is the sum of all the weights of the edges in the path. The edges are weighted according to the evidences used. EuGÈNE can combine several sources of evidence such as probabilistic coding models, output of splice site or start codon prediction software and sequence similarities with transcripts, proteins, or other genomic sequences . Contents and transition edges c and t are penalized respectively by weights Wc and Wt according to a weighting function characterized by parameters specifically set for the corresponding source of evidence. The set of parameters is optimized on a learning dataset by maximizing the overall accuracy of the software. For more information about the weighting methods, please refer to .
Example of transcript alignment integration
A transcript-to-genome alignment can easily be taken into account by weighting the appropriate edges of the graph. To favor a gene prediction in the alignment region, the intergenic track edges included in this region can be penalized by increasing their weight. More finely, the exon and the intron tracks edges can also be penalized at all positions involved respectively in a gap and in a match in the alignment. Thus, all gene structure prediction inconsistent with the transcript alignment information tends to be penalized. More drastically, it is possible to force the prediction to be consistent with the alignment by applying infinite penalty weights. Note that there are several such predictions since the start codon used is unknown and the transcript may be incomplete.
To identify the optimal path defined by the lowest weight, EuGÈNE uses a dynamic programming algorithm inspired from Bellman's shortest-path algorithm , also used for HMM in its Viterbi's version. Improvements of this algorithm allow EuGÈNE to take into account constraints on gene element lengths. For simplicity, we will not describe these sophistications in this paper. The algorithm of EuGÈNE associates to each vertex a variable which contains the weight of the optimal path from to and a variable which contains the vertex that precedes in this optimal path. The weight of this path can be computed recursively from 5' to 3' as:
A short example is displayed in Figure 3. The vertex that minimizes this value provides the previous . At vertex , the best path is retrieved by a simple backtracing procedure through all π. This algorithm is linear in time and space in the length of the sequence (O(ℓ) complexity). It is important to note that the same algorithm can be used in a backward version (from to ), by computing at each vertex the weight of the best path from to as .
AS evidences integration
Given an alternative transcript genomic alignment, any prediction which is optimal among all the predictions that are consistent with the alignment evidence will be called an alternative prediction. Given the set of the previously detected alternative transcripts, we want EuGÈNE-M to produce a set of alternative predictions such that every alternative transcript has a corresponding prediction in this set. A simple way to produce such an alternative prediction would be to inject the exon-intron structure information given by the transcript alignment into the graph as described above (using infinite weights to force the prediction to strictly respect the alignment evidence), and then to execute EuGÈNE on the resulting graph. However, obtaining all alternative predictions would require one execution for each alternative transcript. n being the number of transcripts and l the genomic sequence length, this would result in a O(ln) time complexity, which is not appropriate for long genomic sequences and numerous transcripts.
Hopefully, this complexity can be drastically reduced. The general idea to achieve a realistic complexity is to duplicate the subsection of the graph region involved in an alignment to create a so called local "Parallel Graph Subunit" (PGS), connected to the main graph at its extremities. Each alignment information is taken into account as constraints in the corresponding PGS, in such a way that finding the optimal path going through the PGS provides a corresponding optimal alternative prediction.
Extending the graph model with PGS
For a given PGS A, if we now consider the vertices at the rightmost extremity of A, then the weight of an optimal path that goes from to through A can be computed as . From the given vertex, backtracing in both directions provides an optimal path that represents an optimal prediction in accordance with the transcript alignment evidence.
Predictions are produced in the standard GFF format. The entire optimal annotation is first displayed, followed by the alternative ones. To enhance the readability and to avoid redundancy, for each alternative prediction the name of the corresponding transcript is mentioned and the region that differs from the optimal prediction is displayed. Besides, if several predictions are identical (regarding their predicted CDS only, UTR length differences being ignored), a single representative is displayed, along with the list of its associated transcripts.
The initial filtering and incompatible transcripts identification requires O(n2) pairwise comparisons. Each comparison is itself linear in the maximum number of introns in the transcript compared, which is typically bounded by a small constant and the whole process is therefore in O(n2).
The step that corresponds to the two dynamic programming scans (application of the recursive formula) requires a time and space complexity which is linear in the size of the input data. Indeed, if L is the total nucleic sequences length (genomic + kept alternative transcript), the weights of all (alternative and optimal) predictions can be computed in O(L).
For the backtracing and output step, since each alternative prediction has to be displayed in the region where it differs from the optimal one, and because this can extend beyond the alignment region, it is not possible to obtain an algorithm which is linear in the size of the input. However, it is possible to reach a linear complexity in the size of the output. This can be done by a simple modification of the standard backtracing procedure to avoid a full backtrace for each prediction. This is yet not implemented in the current version of the software.
A typical run of EuGÈNE-M on an AMD Athlon 1.7 GHz takes 47 sec. for a 500 kb BAC (for which 945 transcript alignments were kept after the first quality filtering step).
- Modrek B, Lee C: A genomic view of alternative splicing. Nat Genet 2002, 30: 13–9. 10.1038/ng0102-13View ArticlePubMedGoogle Scholar
- International Human Genome Sequencing Consortium: Initial sequencing and analysis of the human genome. Nature 2001, 409(6822):860–921. 10.1038/35057062View ArticleGoogle Scholar
- Johnson J, Castle J, Garrett-Engele P, Kan Z, Loerch P, Armour C, Santos R, Schadt E, Stoughton R, Shoemaker D: Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science 2003, 302(5653):2141–4. 10.1126/science.1090100View ArticlePubMedGoogle Scholar
- Burge C, Karlin S: Prediction of complete gene structures in human genomic DNA. J Mol Biol 1997, 268: 78–94. 10.1006/jmbi.1997.0951View ArticlePubMedGoogle Scholar
- Krogh A: Using database matches with for HMMGene for automated gene detection in Drosophila. Genome Res 2000, 10(4):391–7. 10.1101/gr.10.4.523View ArticleGoogle Scholar
- Alexandersson M, Cawley S, Pachter L: SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Res 2003, 13(3):496–502. 10.1101/gr.424203PubMed CentralView ArticlePubMedGoogle Scholar
- Cawley SL, Pachter L: HMM sampling and applications to gene finding and alternative splicing. Bioinformatics 2003, 19(Suppl 2):II36-II41.View ArticlePubMedGoogle Scholar
- Modrek B, Resch A, Grasso C, Lee C: Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res 2001, 29(13):2850–9. 10.1093/nar/29.13.2850PubMed CentralView ArticlePubMedGoogle Scholar
- Gelfand MS, Dubchak I, Dralyuk I, Zorn M: ASDB: database of alternatively spliced genes. Nucleic Acids Res 1999, 27: 301–2. 10.1093/nar/27.1.301PubMed CentralView ArticlePubMedGoogle Scholar
- Lee C, Atanelov L, Modrek B, Xing Y: ASAP: the Alternative Splicing Annotation Project. Nucleic Acids Res 2003, 31: 101–5. 10.1093/nar/gkg029PubMed CentralView ArticlePubMedGoogle Scholar
- Thanaraj TA, Stamm S, Clark F, Riethoven JJ, Le Texier V, Muilu J: ASD: the Alternative Splicing Database. Nucleic Acids Res 2004, 32: D64–9. 10.1093/nar/gkh030PubMed CentralView ArticlePubMedGoogle Scholar
- Pospisil H, Herrmann A, Bortfeldt RH, Reich JG: EASED: Extended Alternatively Spliced EST Database. Nucleic Acids Res 2004, 32: D70–4. 10.1093/nar/gkh136PubMed CentralView ArticlePubMedGoogle Scholar
- Huang HD, Horng JT, Lee CC, Liu BJ: ProSplicer: a database of putative alternative splicing information derived from protein, mRNA and expressed sequence tag sequence data. Genome Biol 2003, 4(4):R29. 10.1186/gb-2003-4-4-r29PubMed CentralView ArticlePubMedGoogle Scholar
- Usuka J, Zhu W, Brendel V: Optimal spliced alignment of homologous cDNA to a genomic DNA template. Bioinformatics 2000, 16(3):203–211. 10.1093/bioinformatics/16.3.203View ArticlePubMedGoogle Scholar
- Bonizzoni P, Pesole G, Rizzi R: A Method to Detect Gene Structure and Alternative Splice Sites by Agreeing ESTs to a Genomic Sequence. In Algorithms in Bioinformatics, 3rd International Workshop (WABI), LNCS. Edited by: Benson G, Page R. Springer Verlag; 2003:63–77.View ArticleGoogle Scholar
- Kan Z, Rouchka E, Gish W, States D: Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res 2001, 11(5):889–900. 10.1101/gr.155001PubMed CentralView ArticlePubMedGoogle Scholar
- Kan Z, States D, Gish W: Selecting for functional alternative splices in ESTs. Genome Res 2002, 12(12):1837–45. 10.1101/gr.764102PubMed CentralView ArticlePubMedGoogle Scholar
- Haas B, Delcher A, Mount S, Wortman J, Smith RJ, Hannick L, Maiti R, Ronning C, Rusch D, Town C, Salzberg S, White O: Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 2003, 31(19):5654–66. 10.1093/nar/gkg770PubMed CentralView ArticlePubMedGoogle Scholar
- Eyras E, Caccamo M, Curwen V, Clamp M: ESTGenes: alternative splicing from ESTs in Ensembl. Genome Res 2004, 14(5):976–87. 10.1101/gr.1862204PubMed CentralView ArticlePubMedGoogle Scholar
- Curwen V, Eyras E, Andrews TD, Clarke L, Mongin E, Searle SMJ, Clamp M: The Ensembl automatic gene annotation system. Genome Res 2004, 14(5):942–50. 10.1101/gr.1858004PubMed CentralView ArticlePubMedGoogle Scholar
- Xu Y, Uberbacher E: Automated gene identification in large-scale genomic sequences. J Comput Biol 1997, 4(3):325–38.View ArticlePubMedGoogle Scholar
- Schiex T, Moisan A, Rouzé P: EuGène, an eukaryotic gene finder that combines several type of evidence. In Computational Biology, selected papers from JOBIM' 2000, no. 2066 in LNCS. Springer Verlag; 2001:118–133.Google Scholar
- EuGène web site[http://www.inra.fr/bia/T/EuGene]
- Pavy N, Rombauts S, Déhais P, Mathé C, Ramana D, Leroy P, Rouzé P: Evaluation of gene prediction software using a genomic data set: application to Arabidopsis thaliana sequences. Bioinformatics 1999, 15(11):887–99. 10.1093/bioinformatics/15.11.887View ArticlePubMedGoogle Scholar
- Brendel V, Xing L, Zhu W: Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus. Bioinformatics 2004, 20(7):1157–69. 10.1093/bioinformatics/bth058View ArticlePubMedGoogle Scholar
- Zhu W, Schlueter S, Brendel V: Refined annotation of the Arabidopsis genome by complete expressed sequence tag mapping. Plant Physiol 2003, 132(2):469–84. 10.1104/pp.102.018101PubMed CentralView ArticlePubMedGoogle Scholar
- Dong Q, Schlueter SD, Brendel V: PlantGDB, plant genome database and analysis tools. Nucleic Acids Res 2004, 32: D354–9. 10.1093/nar/gkh046PubMed CentralView ArticlePubMedGoogle Scholar
- GeneSeqer evaluation on AtGDB[http://www.plantgdb.org/AtGDB/prj/BXZ03B/AraSet/AraSet-AtGDB.php]
- Alternative splicing on AtGDB[http://www.plantgdb.org/AtGDB/prj/ZSB03PP/alternativeSplicing]
- Arabidopsis splicing variations on TIGR db[http://www.tigr.org/tdb/e2k1/ath1/altsplicing/splicing_variations.shtml]
- Boguski MS, Lowe TM, Tolstoshev CM: dbEST-database for expressed sequence tags. Nat Genet 1993, 4(4):332–3. 10.1038/ng0893-332View ArticlePubMedGoogle Scholar
- Florea L, Hartzell G, Zhang Z, Rubin G, Miller W: A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res 1998, 8(9):967–974.PubMed CentralPubMedGoogle Scholar
- Foissac S, Bardou P, Moisan A, Cros MJ, Schiex T: EUGENE'HOM: A generic similarity-based gene finder using multiple homologous sequences. Nucleic Acids Res 2003, 31(13):3742–5. 10.1093/nar/gkg586PubMed CentralView ArticlePubMedGoogle Scholar
- Bellman R: Dynamic Programming. Princeton, New Jersey: Princeton Univ Press; 1957.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.