- Open Access
DEDB: a database of Drosophila melanogaster exons in splicing graph form
© Lee et al; licensee BioMed Central Ltd. 2004
- Received: 31 August 2004
- Accepted: 07 December 2004
- Published: 07 December 2004
A wealth of quality genomic and mRNA/EST sequences in recent years has provided the data required for large-scale genome-wide analysis of alternative splicing. We have capitalized on this by constructing a database that contains alternative splicing information organized as splicing graphs, where all transcripts arising from a single gene are collected, organized and classified. The splicing graph then serves as the basis for the classification of the various types of alternative splicing events.
DEDB http://proline.bic.nus.edu.sg/dedb/index.html is a database of Drosophila melanogaster exons obtained from FlyBase arranged in a splicing graph form that permits the creation of simple rules allowing for the classification of alternative splicing events. Pfam domains were also mapped onto the protein sequences allowing users to access the impact of alternative splicing events on domain organization.
DEDB's catalogue of splicing graphs facilitates genome-wide classification of alternative splicing events for genome analysis. The splicing graph viewer brings together genome, transcript, protein and domain information to facilitate biologists in understanding the implications of alternative splicing.
- Alternative Splice
- Splice Variant
- Pfam Domain
- Alternative Splice Event
- Intron Retention
The completion of the draft sequence of the Drosophila melanogaster genome in March 2000 [1, 2] and the availability of quality annotations by FlyBase in 2002  presents an excellent opportunity for the study of alternative splicing. Although the annotations themselves provide an insight to the amount of alternative splicing, they do not provide any classification of the types of alternative splicing events present. Different forms of alternative splicing have different biological bases and the classification of alternative splicing events is critical for further work in deciphering the regulatory controls that govern these processes. To this end, we transformed all known gene structure information obtained from the genome annotations into splicing graphs based on the approach first proposed by Heber et al. in 2002 . We then created simple but robust rules for classifying the splicing graphs into various alternative splicing events. The rules created allows for the detection of multiple forms of alternative splicing within the same gene. To facilitate the assessment of the impact of alternative splicing on the protein product in particular with respect to the domain organization of the protein, Pfam  domains were mapped onto the transcripts using HMMER . All these data were then loaded into DEDB (Drosophila melanogaster Exon Database) . To aid in visualizing these splicing graphs, a web-based splicing graph viewer was also developed. The splicing graph viewer integrates gene structure, transcript, protein and domain information into an easily understandable interface that is viewable with any current web browser. The splicing graphs as well as the alternative splicing event classifications are available for download as XML files. A XML schema is available for parsing and validation of the XML files.
Drosophila melanogaster genome annotations (release 3.2) were obtained from FlyBase  as Game XML files. Gene structure information including the location of the transcript, the start and end positions of each exon that make up the transcript and the protein coding region were parsed out, checked for consistency and then loaded into a relational database (MySQL). Pfam HMM models were retrieved from Pfam release 12 and used as the database for the hmmpfam program (part of HMMER) to search the transcript protein sequences for structural domains, with an expectancy values of less than 0.001. The results of the search were parsed, mapped onto the protein sequence and imported into the database.
Construction of the splicing graphs
Contents of DEDB. Table showing a breakdown of the contents in the database.
Total number of transcripts
Total number of single exonic genes
Total number of multi exonic genes
Total number of splicing graphs
Total number of exons
Total number of introns
Total number of nodes
Total number of connections
Total number of splicing graphs having alternative splicing events
Total number of splicing graphs having alternative TSS events
Total number of splicing graphs having alternative TTS events
Total number of splicing graphs having alternative initiation exon events
Total number of splicing graphs having alternative termination exon events
Total number of splicing graphs having alternative acceptor events
Total number of splicing graphs having alternative donor events
Total number of splicing graphs having cassette exon events
Total number of splicing graphs having intron retention events
Total number alternative TSS events
Total number alternative TTS events
Total number alternative initiation exon events
Total number alternative termination exon events
Total number alternative acceptor events
Total number alternative donor events
Total number alternative cassette exon events
Total number alternative intron retention events
Classification of alternative splicing
The database together with the splicing graph viewer is freely available at http://proline.bic.nus.edu.sg/dedb/index.html. Users can query the database using FlyBase gene names, FlyBase Gene Symbols, Pfam Accession Numbers or Pfam Identifiers via the query page. Users can also query the database using BLAST  searches. This is particularly useful if one wishes to know the Drosophila melanogaster homology together with alternative splicing information of a particular gene. Lists of splicing graphs for the various types of alternative splicing events are also provided on the website for users who are interested in a certain type of alternative splicing. For users who wish to use large subsets of the data, they can download the XML files available from the same site. To aid parsing and validation of the XML file, a XML schema is available. DEDB can also be accessed via links on Flybase gene records, under the external database links section. Correspondingly, the DEDB Splicing Graph Viewer provides links back to FlyBase Gene and Annotation records, where experimental evidence for the gene structure has also been provided. Basic statistical analysis of the database can be found at the DEDB website http://proline.bic.nus.edu.sg/dedb/stats.html.
Splicing graph viewer
Visualization of alternative splicing
By condensing all the various splicing variants into a single graph, where each splicing variant is a path through the graph, users can quickly establish the types and effects of various alternative splicing events present in the gene. Users can quickly pick up bifurcations which denote alternative splicing events far quicker than in the case of the traditional approach of presenting separate schematic representations of each splice variant, where the user has to correlate the splicing patterns from the transcript diagrams, to determine the impact and type of alternative splicing. The classical approach is particularly tedious in cases where the number of splice variants are numerous (for example the Drosophila moe gene, splicing graph 916, on the DEDB methodology link) resulting in the user having to correlate large amounts of data to comprehend all the alternative splicing events taking place. The DEDB schematic representation of the splicing graph is different from the one proposed by Heber et al.  and implemented in Alternative Splicing Gallery (ASG) . The original representation used single linear block representation of exons connected by lines representing introns and alternative donor and acceptor sites as well as intron retention represented by single blocks. Instead, we have chosen to depict all the exons individually as we felt that this is more intuitive for biologists by making the impact of the alternative splicing more pronounced. Protein sequence details like the start and end of translation as well as detected Pfam domains are presented by the splicing graph viewer. This allows biologists to infer the impact of alternative splicing on the corresponding protein sequences as well as the domain organization. Users can also download FASTA sequences of specific entities like introns and exons for other analysis. Biologists may also be interested in the Drosophila melanogaster homology of their gene of interest, which is made possible through a BLAST search on the DEDB query page. The splicing graph of the Drosophila melanogaster homology may provide insights into the possible splice variants in the gene of interest. It could also provide information on the level of conservation of alternative splicing between orthologs.
Classified splicing dataset
The use of splicing graphs allows the creation of simple but robust rules that can detect multiple distinct alternative splicing events within the same gene. Traditional approaches usually require the construction of more complex rules. For example, the detection of a cassette exon in the tradition approach requires that an internal exon be checked against all the introns in all the splicing variants to detect instances where the exon falls within an intron. This process has to be repeated for each exon against all the introns resulting in a long and complex computation. Furthermore as the exon could be found in several splicing variants, the detected cassette exon could be redundant and additional steps have to be taken to remove this redundancy. All of this is avoided by the splicing graph representation, as it is a condensed view of all the various splicing variants arising from a single gene. Classification of the alternative splicing types in Drosophila melanogaster would allow users to target specific types of alternative splicing events for analysis. This is useful as the various types of alternative splicing have different biological bases and therefore exhibit different phenotypes. The analysis of these phenotypes will be greatly aided by a set of data that is specific to one form of alternative splicing as provided by DEDB. The availability of a clean datasets of alternative splicing events [11, 12] has proved to be useful in providing insights into the phenomenon of alternative splicing . The data available from DEDB would no doubt be useful to many users studying alternative splicing as a major factor leading to complexity in higher eukaryotes.
A summary of the alternative splicing events in DEDB is presented in Table 1. Detailed statistical information (general statistics, exon and intron length statistics and motif analysis) are available from the "Stats" page of the website (Lee, Tan and Ranganathan, unpublished results). Note however that the genes models are constructed with far more 5' ESTs than 3' ESTs  and the results must be viewed in the light of available experimental EST data.
Of the total of 13,222 genes in DEDB, 2,646 (20%) are alternatively splicing. This is significantly less than the amount of alternative splicing found in higher eukaryotes like humans , but sufficient to indicate that alternative splicing is a common phenomenon in Drosophila melanogaster. The amount of alternative splicing increases to 24.4% if we consider transcript diversity in the 10,848 multi-exonic genes alone.
Failure of intron definition is more likely to result in intron retention as opposed to exon definition in which case, failure leads to cassette exons. Initial analysis of the DEDB data indicates a bias towards cassette exons (1,228) over intron retention (983) events, so that exon definition is less stringent than intron definition. The short introns in Drosophila melanogaster are also thought to result in greater intron definition. The data observed could be due to the splicing machinery adopting a definition model dependent on the length of the intron or exon in question . This is supported by the fact that cassette exons tend to be flanked by introns far longer than the mean value (exon and intron length statistics available via the "Stats" link). The median for the cassette exon length is 150 bp in contrast with the flanking 5' and 3' introns, which are 653 bp and 639 bp respectively. The reverse is also true for intron retention where the median for the intron being retained is 101 bp while the flanking 5' and 3' exons are 163 bp and 261 bp respectively.
Information content analysis indicates that alternative donor and acceptor sites (with mean values of 5.95 and 5.61 bits) possess less information than constitutive sites (9.74 and 8.52 bits respectively; additional data available on website). This observation is consistent with the general notion that alternatively spliced exons exhibit splicing motifs deviating more from the consensus motifs . Cassette exons (CE) and intron retentions (IR) also show lower mean individual information content on both donor (CE: 6.76 and IR: 5.39 bits) and acceptor sites (CE:7.08 and IR: 6.19 bits) as compared to constitutive exons.
The addition of Pfam domain information allows users to assess the impact of alternative splicing events on the proteins generated, enabling correlations not possible with the genome annotations alone.
Future work would focus on integrating other relevant information onto the splicing graphs, such as three-dimensional structural information as well as DEDB analysis results. Expansion of the splicing graph representation available in DEDB to other organisms is also underway.
The data housed in DEDB is organized as splicing graphs, which allows for ease of alternative splicing classifications. This has allowed DEDB to provide clean sets of data containing specific types of alternative splicing events. These specific sets of data could prove useful in understanding the biological basis of alternative splicing because different forms of alternative splicing have different biological basis. The splicing graph viewer provided allows biologists to quickly and intuitively understand the effects of alternative splicing on a gene of interest, thus aiding their research.
The database is available at http://proline.bic.nus.edu.sg/dedb/index.html suitable for most graphical web browser. XML files of the data contained in the database are also available together with an XML schema to aid parsing.
We would like to thank Dr Donald Gilbert for his help in the creating links to DEDB from the FlyBase gene records. We would also like to thank the bioinformatics team at the Department of Biochemistry, National University of Singapore and the anonymous reviewers for their helpful comments and discussions. Bernett Lee is grateful to the National University of Singapore for the award of an Agency for Science, Technology and Research, Singapore (A-STAR) scholarship.
- Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, George RA, Lewis SE, Richards S, Ashburner M, Henderson SN, Sutton GG, Wortman JR, Yandell MD, Zhang Q, Chen LX, Brandon RC, Rogers YH, Blazej RG, Champe M, Pfeiffer BD, Wan KH, Doyle C, Baxter EG, Helt G, Nelson CR, Gabor GL, Abril JF, Agbayani A, An HJ, Andrews-Pfannkoch C, Baldwin D, Ballew RM, Basu A, Baxendale J, Bayraktaroglu L, Beasley EM, Beeson KY, Benos PV, Berman BP, Bhandari D, Bolshakov S, Borkova D, Botchan MR, Bouck J, Brokstein P, Brottier P, Burtis KC, Busam DA, Butler H, Cadieu E, Center A, Chandra I, Cherry JM, Cawley S, Dahlke C, Davenport LB, Davies P, de Pablos B, Delcher A, Deng Z, Mays AD, Dew I, Dietz SM, Dodson K, Doup LE, Downes M, Dugan-Rocha S, Dunkov BC, Dunn P, Durbin KJ, Evangelista CC, Ferraz C, Ferriera S, Fleischmann W, Fosler C, Gabrielian AE, Garg NS, Gelbart WM, Glasser K, Glodek A, Gong F, Gorrell JH, Gu Z, Guan P, Harris M, Harris NL, Harvey D, Heiman TJ, Hernandez JR, Houck J, Hostin D, Houston KA, Howland TJ, Wei MH, Ibegwam C, Jalali M, Kalush F, Karpen GH, Ke Z, Kennison JA, Ketchum KA, Kimmel BE, Kodira CD, Kraft C, Kravitz S, Kulp D, Lai Z, Lasko P, Lei Y, Levitsky AA, Li J, Li Z, Liang Y, Lin X, Liu X, Mattei B, McIntosh TC, McLeod MP, McPherson D, Merkulov G, Milshina NV, Mobarry C, Morris J, Moshrefi A, Mount SM, Moy M, Murphy B, Murphy L, Muzny DM, Nelson DL, Nelson DR, Nelson KA, Nixon K, Nusskern DR, Pacleb JM, Palazzolo M, Pittman GS, Pan S, Pollard J, Puri V, Reese MG, Reinert K, Remington K, Saunders RD, Scheeler F, Shen H, Shue BC, Siden-Kiamos I, Simpson M, Skupski MP, Smith T, Spier E, Spradling AC, Stapleton M, Strong R, Sun E, Svirskas R, Tector C, Turner R, Venter E, Wang AH, Wang X, Wang ZY, Wassarman DA, Weinstock GM, Weissenbach J, Williams SM, WoodageT, Worley KC, Wu D, Yang S, Yao QA, Ye J, Yeh RF, Zaveri JS, Zhan M, Zhang G, Zhao Q, Zheng L, Zheng XH, Zhong FN, Zhong W, Zhou X, Zhu S, Zhu X, Smith HO, Gibbs RA, Myers EW, Rubin GM, Venter JC: The genome sequence of Drosophila melanogaster. Science 2000, 287: 2185–2195. 10.1126/science.287.5461.2185View ArticlePubMedGoogle Scholar
- Hoskins RA, Smith CD, Carlson JW, Carvalho AB, Halpern A, Kaminker JS, Kennedy C, Mungall CJ, Sullivan BA, Sutton GG, Yasuhara JC, Wakimoto BT, Myers EW, Celniker SE, Rubin GM, Karpen GH: Heterochromatic sequences in a Drosophila whole-genome shotgun assembly. Genome Biol 2002, 3: RESEARCH0085. 10.1186/gb-2002-3-12-research0085PubMed CentralView ArticlePubMedGoogle Scholar
- Misra S, Crosby MA, Mungall CJ, Matthews BB, Campbell KS, Hradecky P, Huang Y, Kaminker JS, Millburn GH, Prochnik SE, Smith CD, Tupy JL, Whitfied EJ, Bayraktaroglu L, Berman BP, Bettencourt BR, Celniker SE, de Grey AD, Drysdale RA, Harris NL, Richter J, Russo S, Schroeder AJ, Shu SQ, Stapleton M, Yamada C, Ashburner M, Gelbart WM, Rubin GM, Lewis SE: Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol 2002, 3: RESEARCH0083. 10.1186/gb-2002-3-12-research0083PubMed CentralView ArticlePubMedGoogle Scholar
- Heber S, Alekseyev M, Sze SH, Tang H, Pevzner PA: Splicing graphs and EST assembly problem. Bioinformatics 2002, 18 Suppl 1: S181–8.View ArticlePubMedGoogle Scholar
- Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL, Marshall M, Sonnhammer EL: The Pfam protein families database. Nucleic Acids Res 2002, 30: 276–280. 10.1093/nar/30.1.276PubMed CentralView ArticlePubMedGoogle Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics 1998, 14: 755–763. 10.1093/bioinformatics/14.9.755View ArticlePubMedGoogle Scholar
- DEDB: Drosophila melanogaster Exon Database[http://proline.bic.nus.edu.sg/dedb/index.html]
- The FlyBase Consortium: The FlyBase database of the Drosophila genome projects and community literature. Nucleic Acids Res 2003, 31: 172–175. 10.1093/nar/gkg094View ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215: 403–410. 10.1006/jmbi.1990.9999View ArticlePubMedGoogle Scholar
- Leipzig J, Pevzner P, Heber S: The Alternative Splicing Gallery (ASG): bridging the gap between genome and transcriptome. Nucleic Acids Res 2004, 32: 3977–3983. 10.1093/nar/gkh731PubMed CentralView ArticlePubMedGoogle Scholar
- Lee C, Atanelov L, Modrek B, Xing Y: ASAP: the Alternative Splicing Annotation Project. Nucleic Acids Res 2003, 31: 101–105. 10.1093/nar/gkg029PubMed CentralView ArticlePubMedGoogle Scholar
- Thanaraj TA, Stamm S, Clark F, Riethoven JJ, Le Texier V, Muilu J: ASD: the Alternative Splicing Database. Nucleic Acids Res 2004, 32 Database issue: D64–9. 10.1093/nar/gkh030View ArticleGoogle Scholar
- Roca X, Sachidanandam R, Krainer AR: Intrinsic differences between authentic and cryptic 5' splice sites. Nucleic Acids Res 2003, 31: 6321–6333. 10.1093/nar/gkg830PubMed CentralView ArticlePubMedGoogle Scholar
- Brett D, Hanke J, Lehmann G, Haase S, Delbruck S, Krueger S, Reich J, Bork P: EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS Lett 2000, 474: 83–86. 10.1016/S0014-5793(00)01581-7View ArticlePubMedGoogle Scholar
- Berget SM: Exon recognition in vertebrate splicing. J Biol Chem 1995, 270: 2411–2414.View ArticlePubMedGoogle Scholar
- Itoh H, Washio T, Tomita M: Computational comparative analyses of alternative splicing regulation using full-length cDNA of various eukaryotes. Rna 2004, 10: 1005–1018. 10.1261/rna.5221604PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.