A method of predicting changes in human gene splicing induced by genetic variants in context of cis-acting elements
© Churbanov et al; licensee BioMed Central Ltd. 2010
Received: 17 April 2009
Accepted: 12 January 2010
Published: 12 January 2010
Polymorphic variants and mutations disrupting canonical splicing isoforms are among the leading causes of human hereditary disorders. While there is a substantial evidence of aberrant splicing causing Mendelian diseases, the implication of such events in multi-genic disorders is yet to be well understood. We have developed a new tool (SpliceScan II) for predicting the effects of genetic variants on splicing and cis-regulatory elements. The novel Bayesian non-canonical 5'GC splice site (SS) sensor used in our tool allows inference on non-canonical exons.
Our tool performed favorably when compared with the existing methods in the context of genes linked to the Autism Spectrum Disorder (ASD). SpliceScan II was able to predict more aberrant splicing isoforms triggered by the mutations, as documented in DBASS5 and DBASS3 aberrant splicing databases, than other existing methods. Detrimental effects behind some of the polymorphic variations previously associated with Alzheimer's and breast cancer could be explained by changes in predicted splicing patterns.
We have developed SpliceScan II, an effective and sensitive tool for predicting the detrimental effects of genomic variants on splicing leading to Mendelian and complex hereditary disorders. The method could potentially be used to screen resequenced patient DNA to identify de novo mutations and polymorphic variants that could contribute to a genetic disorder.
Human pre-mRNA sequences are subjected to complex multi-stage modifications by splicing, where frequent variations in this process contribute to the proteome diversity. During splicing the intronic sequences are recognized and excised by the spliceosome, where the relatively short exonic sequences are joined together to form mature mRNA. The Splice Site (SS) signals at the intronic 5' end (donor) and 3' end (acceptor, polypyrimidine tract and the branch point) are necessary, but not sufficient for accurate and efficient exon recognition by the spliceosome [1, 2]. Additional exon-proximal elements are required for proper recognition of weakly defined or alternatively committed exons . These cis-acting elements include a repertoire of Exonic Splicing Enhancers (ESEs) and Intronic Splicing Enhancers (ISEs) along with a number of Exonic Splicing Silencers (ESSs) and Intronic Splicing Silencers (ISSs). The evolutionary fine-tuned antagonism between enhancing and silencing elements leads to the proper splicing of human pre-mRNAs. Mutations disrupting cis-acting elements and SSs themselves, as well as mutations creating cryptic SSs and cis-acting factor binding sites can lead to severe diseases .
Mutations affecting alternative and constitutive splicing play a major role in human hereditary disorders . More than 5,477 splicing mutations (as of July 2008) have been documented in the HGMD database , which makes this group of mutations one of the most frequent disease-causing alterations. Databases DBASS5  and DBASS3  contain 431 and 283 well annotated disease-causing aberrant splicing events, respectively. A clear understanding of elements affecting splicing could potentially aid diagnosis and development of novel therapeutic strategies [9, 10].
Since alternations in splicing are ubiquitous among human multi-exonic genes , it is important to understand the key regulators of this process. The 5'GC SSs, flanking <1% of human exons , were shown to play an important role in the genesis of alternative splicing in human genes  and were found to accumulate in mammalian lineage . The majority of 5'GC SS sensors, i.e. computational procedures reporting how well an oligonucleotide would play a role of a SS, built up to date is based on weight matrices [12, 15]. Being an elusive signal, it is difficult to collect a representative learning set that would facilitate building a stronger model. The importance of proper modeling the 5'GC SS comes from the observation that some mutations documented in DBASS5 , such as IVS27+3_6dup(GGGT)(-96), IVS7+1G>T(-40), IVS9+1G>A(-45) and others trigger use of cryptic non-canonical 5'GC SS. Despite of the importance of this splicing signal only few splicing prediction methods, such as GeneSequer  and NetGene2 , are able to score non-canonical exons.
Human introns contain many decoy exons that are similar to authentic exons, but are never committed by the spliceosome and outnumber the real exons by an order of magnitude . The mechanisms that allow accurate discrimination between decoy exons and their authentic counterparts are poorly understood. Codon sequence contained in coding exons have particular 3-periodic compositional biases  that allow gene finders, such as GenScan and HMMgene, stitching putative coding exons in a frame-consistent fashion with high accuracy . However, methods that rely on protein coding potential features experience severe performance loss when confronted with non-coding exons [23, 24]. On the other hand, human mutations frequently create de novo cryptic exons with no apparent coding potential leading to severe disorders caused by aberrant splicing [7, 8]. Therefore, tools are needed to explain the effects of mutations in terms of signals associated with splicing free of protein coding context .
Investigation whether prediction of SSs could be accomplished without relying on protein coding potential started with simple tools such as SpliceView and GENIO. The NetUTR tool has been specifically constructed to predict SSs in 5' untranslated regions (UTRs), therefore addressing the problem of splicing prediction without relying on protein coding features. Maximum Entropy Sensor  has been found to be one of the most sensitive diagnostic methods predicting the effects of mutations in human genes [7, 8, 28]. ExonScan, a tool built around the exon definition model, combines the power of the Maximum Entropy Sensor with the Logarithm of Odds (LOD) biases associated with the previously reported ESEs , ESSs  and poly-G runs (known ISEs ). Recent CRYP-SKIP[32, 33] tool is based on multivariate logistic discrimination procedure that distinguishes the two aberrant splicing outcomes from DNA sequences. Bayesian SS sensor , shown to outperform the Maximum Entropy Sensor , is an integral part of the SpliceScan tool , built around the SS definition model supported by the enhancers predicted with the MHMMotif tool  and various other previously reported silencing and enhancing signals. The SpliceScan has been found to be especially efficient on the test set of short 5' UTR fragments.
We introduce a new tool SpliceScan II built around the exon definition model . Unlike in previous SpliceScan method, the new tool has option of displaying factors contributing to a score assigned to a specific exon isoform thus informing medical practitioners of possible changes in splicing commitment caused by polymorphic variants and mutations. We have used a much larger set of orthologous exons originating from 23 Tetrapoda organisms to train the new splicing model following an observation that the spliceosomal and cis-acting factors stay mostly intact across vertebrates [4, 35–38], where the genes encoding well-known RNA binding proteins involved in splicing regulation are enriched with ultraconserved elements . The SpliceScan II tool is based on the Bayesian SS sensors, and uses the novel set of enhancer and silencer elements computationally predicted in Tetrapoda organisms . Having a large collection of Tetrapoda orthologous exons we were able to collect learning set of 5'GC SSs, representative enough to train a new Bayesian 5'GC SS sensor, used in our tool. We compared the performance of our tool with other methods on gene fragments annotated in DBASS5  and DBASS3  and gene structures linked to Autism Spectrum Disorder (ASD). We further evaluated the method by predicting the effects on splicing for some of the previously reported polymorphisms associated with Alzheimer's and the Breast cancer, suggesting possible mechanism causing the disease predisposition associated with such variants.
Results and Discussion
Predicting aberrant splicing isoforms
Tools accuracy predicting the aberrant splicing events.
Our tool was twice as accurate compared to other top performing methods for gene splicing prediction, such as GenScan. This result clearly demonstrates the performance improvement on gene fragments containing aberrant splicing isoforms when a method relies on splicing factors and signals rather than protein coding potential. The other ExonScan method was not able to predict many aberrant splicing isoforms mainly because of the limited sensitivity, as discussed in the following subsection.
SpliceScan II splicing prediction accuracy
where TE is the number of accurately predicted exon boundaries, AE is the number of annotated exon boundaries in the test set and PE is the number of predicted exon boundaries.
The winning tool should be both sensitive and specific when predicting exonic boundaries for various thresholds. Our tool appeared to be twice as sensitive compared to other similar NetUTR and ExonScan methods (although at expense of much lower specificity), which would allow scoring roughly twice as many exonic isoforms. It has lesser sensitivity than the previously constructed SpliceScan method, but the objective of two methods is different. Our new tool has the main focus to predict how certain internal exonic isoforms get activated, rather than assigning probabilistic scores to all putative SSs the way SpliceScan and NetUTR do.
Characteristics of tools shown in Figure 2 emphasize on comparative aspect of their performance, and do not necessarily reflect the prediction quality in practical cases. Intronic regions in our test set are long, which negatively affects sensitivity vs. false positive rates (the majority of false positive exons is predicted within introns). It has also been a split-sample test design for our tool, where we specifically removed the extended exons associated with the test set from the learning set, which has slightly detrimental effect on the SpliceScan II performance. The mutations causing aberrant splicing events, as annotated in DBASS databases [7, 8], are normally located close to annotated exons, therefore in real experiments area of focus would normally be shifted to an annotated exon and surrounding context, where performance of our and other methods would certainly be higher than shown in Figure 2.
The 5'GC SS sensor outperforms the existing sensor designs based on weight matrices [12, 15] for sensitivity values higher than 35%. The initial artifact in the trajectory below 35% sensitivity could be explained by the fact that 34.15% of 5'GC SSs are CAGGC AAGT and AAGGC AAGT, on which our sensor returns two predefined normalized scores of 0.914 and 0.744, correspondingly.
Predicting variations in splicing induced by disease associated SNPs
Number of predicted splicing events induced by the same number of disease associated and control SNPs randomly selected from the loci of 238 genes linked to ASD.
Type of event
Breast cancer associated
Predicted exon corresponding to an annotated exon disappears
Predicted exon corresponding to an annotated exon changes a score
Predicted exon sharing a SS with an annotated exon changes a score
Predicted exon sharing a SS with an annotated exon disappears
New predicted cryptic exon is created sharing a SS with with an annotated exon
Predicted exon disappears
New predicted cryptic exon is created
Here we tried to rank the polymorphic variations according to their possible destabilizing effect on splicing. We reported [see Additional File 2 Tables S1 and S2] polymorphic variations that change annotated exon score more than 2%, which according to  could cause increased exon skipping or retention compared to the reference exon. According to Table 2 number of such events induced by the disease associated SNPs is at least 3.5 times higher compared to control SNP groups, which suggests active role of the disease associated SNPs in modulation of predicted exonic strengths. These variations could indicate consequently different splicing commitment patterns for the affected exonic isoforms. Another class of events is the score change for the exons overlapping with the annotated exon, which according to [7, 8, 41] could disrupt mRNA inclusion patterns for alternatively used exon isoforms sharing an annotated boundary. According to Table 2 number of such events induced by the disease associated SNPs is also significantly increased compared to control SNP groups, which suggests destabilizing role for many of such changes listed in [see Additional File 2 Tables S1 and S2]. Similar splicing destabilizing effect could be achieved by simply removing or creating additional exon isoforms sharing a SS with an annotated exon and the number of such predicted events induced by the disease associated SNPs is also substantially increased compared to controls. On the contrary, the number of polymorphic variations associated with creation of new cryptic SSs or pseudo exon deletion is approximately the same for the disease associated and control group of SNPs, which suggests insignificant effect on splicing for these classes of events.
Using the set of previously predicted cis-acting elements we were able to construct a splicing simulator capable of predicting exon score changes induced by mutations and polymorphic variants thus elucidating possible mechanism behind such variants leading to disorders caused by aberrant splicing.
Our tool performs favorably, compared to other splicing prediction methods, in context of genes linked to ASD. SpliceScan II provided more accurate prediction of aberrant splicing events, as documented in DBASS5  and DBASS3 , compared to existing methods. Although the performance of our tool predicting the effect of mutations triggering an aberrant splicing is high compared to other methods, it could not be used as a general ab initio gene structural annotation method since the number of false positive predicted exons is high, as could be seen in Figure 2, though the fraction of reported false positives is comparable to what reported by other similar methods. Therefore, the most informative use of our method would evolve screening of polymorphic variants for possible splicing alternations in the context of known reference human gene structures. To accomplish this task we have created companion Autism Candidate Gene Map (ACGMAP) database http://www.meddean.luc.edu/node/375 that contains such structures and known alternative splicing variants for candidate ASD genes.
Among the SNPs listed in dbSNP http://www.ncbi.nlm.nih.gov/projects/SNP/ as located within a gene locus, extended with 2 kb upstream and downstream flanks, only 1% are non synonymous variants . The rarity of such SNPs suggested search for other causative variants affecting protein function through alternations in gene regulation, where disruption in splicing regulation seems a natural choice. Here we conducted search for causative alternations under realistic assumption that not all the SNPs associated with a disease are causative; chances are high that these SNPs were linked with causative variants located at the same haplotype. Moreover, associated variants may have different mechanism of compromising genome integrity such as creating missense/nonsense variants or affecting gene transcription regulatory elements. Nevertheless, we have been able to establish a number of potentially disease-causing splicing alternations.
Detected potentially detrimental exon score changes for disease associated SNPs are generally milder compared to the predicted changes associated with mutations annotated in DBASS databases [7, 8]. We did not predict any disease associated SNPs causing an annotated exon to disappear, an event that would most likely have highly detrimental consequences. Comparison to the predicted changes associated with the same sized control groups of randomly selected SNPs indicated that disease-causing SNPs have pronounced excess in the number of detected potentially splice-disrupting variants. Careful examination of factors contributing to an exon score variation could lead to a plausible explanation of causative mechanisms behind the disease associated SNPs.
Sequence data collection and processing
A set of 2,333,379 Tetrapoda exons extended with 205 nt flanks from adjacent introns has been obtained as previously described in . Pseudoexons, which were defined here as regions located between decoy 3' and 5' SSs, were extracted from intronic sequences flanked by two homology-based predicted exons in data set of human and mouse gene structures as described in . The decoy 3' and 5' SSs were predicted by the Bayesian SS sensor . The first and last 150 nt in every intronic sequence were excluded to avoid statistical biases associated with exon proximal ISEs/ISSs . The sum of decoy 3' and 5' Bayesian SS sensor scores had to exceed 0.05, where the score for each signal was on a continuous 0 to 1 scale. The pseudoexon lengths were chosen to be longer than 5 nt and less than 400 nt, where 99% of authentic internal exons reside in this length range [41, 44]. Flanking intronic regions of 205 nt were required on both sides of pseudo exons to estimate if any elements are associated with pseudo SSs. Pseudo exons were also checked for uniqueness and were discarded if either flanking regions of a pseudo exon or surrounding intronic fragments were identical to those previously processed.
Through the literature search we have collected the test set of 238 human genes previously linked to ASD [see Additional File 3] as a sample representative collection of important human genomic regions with potential implication in medical practice. We excluded all the extended exons corresponding to ASD genes from the learning set of SpliceScan II tool for the purposes of split-sample performance testing. We constructed a test set of pre-mRNA sequences for ASD genes along with the corresponding gene structural annotation. The set contains 4,650 known canonical 5' and 3' SS pairs flanking the internal exons that need to be predicted by various methods.
Learning the model
The middle two terms in (1) are the LOD ratio associated with the posterior probability score returned by the Bayesian sensor  for the 3' and 5' SSs. The first term in the sum (1) takes into account the evidence provided by the enhancers/silencers and comes up with a valid posterior LOD ratio.
To resolve LOD score contribution from overlapping enhancing/silencing elements we have allocated two sorted lists containing elements; one with positive LOD scores and another with negative. We keep only the elements with the highest negative LOD scores if any overlaps exist among silencers and with the highest positive LOD score if any overlaps exist among the enhancers. Such strategy allows scoring the overlapping elements that are antipodes in their enhancing profiles . This way we can also choose between the shorter and longer version of the predicted cis-acting elements sharing the common prefix, relying only on elements contributing the maximum absolute LOD score.
Frequencies of oligonucleotides playing role of 5'GC SSs versus frequency of decoy 5'GC SS-like oligonucleotides in pre-mRNA sequences recordered for tetrapoda organisms.
Counted as true SS in Vertebrates
Counted as decoy SS in Vertebrates
where P(SS) - prior probability of an oligonucleotide to be 5'GC SS, P(-SS) - prior probability of an oligonucleotide to be donor-like signal, P(oligo|SS) - likelihood of oligonucleotide in case of 5'GC SS, P(oligo|-SS) - likelihood of oligonucleotide in case of 5'GC SS-like signal.
Since the 5'GC SSs are recognized by the standard U2 spliceosome  and are commonly interchangeable with the canonical 5' SSs , it is reasonable to assume they share the common context. For that reason the splicing signals predicted by the newly constructed Bayesian 5'GC SS sensor were placed in the same probabilistic context of the normal 5'SSs, except for the different initial LOD characteristic of the 5'GC SS sensor and additional normalization histogram to specifically normalized score for 5'GC SSs flanked exons.
Constructing the test set
We wanted to estimate a potential implication of disease associated Single Nucleotide Polymorphisms (SNPs) on splicing, since many such variants emerge from recently conducted association studies. A mechanism by which these variations influence a disorder predisposition remains elusive in many cases. We have identified 1,481 SNPs that have been previously associated with Alzheimer's available through AlzGene database http://www.alzforum.org/res/com/gen/alzgene/default.asp and the literature sources cited at Alzheimer research forum http://www.alzforum.org/ and the 716 SNPs that have been previously associated with the breast cancer [48–53] [see Additional File 3]. We batch downloaded the sequences for the SNPs from the dbSNP http://www.ncbi.nlm.nih.gov/projects/SNP/, BLASTN  aligned these sequences against Ensembl genomic contig sequences obtained from EBI Alternative Splicing Database project http://www.ebi.ac.uk/asd/altsplice/humrel3.html, processed the results and mapped the location of SNPs to the genomic contig sequences. For the same genomic sequences we predicted the gene structures with BLAT  using the RefSeq mRNA sequences ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot. We synchronized SpliceScan II ab initio splicing predictions with the homology-based annotated exons and reported changes induced by the polymorphic variations. Sets of control SNPs were randomly selected from loci of 238 genes linked to ASD [see Additional File 3].
We would like to acknowledge contribution of Igor Rogozin who has provided his expert opinion on the design concept at the initial phase of the project. This study would not be possible without continuous support from the Preventive Medicine and Epidemiology department at the Loyola University Chicago Stritch School of Medicine and Dr. Manuel Diaz. This work has been supported by JDRF International grant (2008-047) and Dr. Hicks startup fund. We would like to thank anonymous reviewers for the suggestions they have made that substantially improved quality of this manuscript.
- Mount SM: Genomic sequence, splicing, and gene annotation. Am J Hum Genet 2000, 67(4):788–792. 10.1086/303098View ArticlePubMedPubMed CentralGoogle Scholar
- Lim LP, Burge CB: A computational analysis of sequence features involved in recognition of short introns. Proc Natl Acad Sci USA 2001, 98(20):11193–11198. 10.1073/pnas.201407298View ArticlePubMedPubMed CentralGoogle Scholar
- Wang Z, Burge CB: Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA 2008, 14(5):802–813. 10.1261/rna.876308View ArticlePubMedPubMed CentralGoogle Scholar
- Cartegni L, Chew SL, Krainer AR: Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet 2002, 3(4):285–298. 10.1038/nrg775View ArticlePubMedGoogle Scholar
- Lopez-Bigas N, Audit B, Ouzounis C, Parra G, Guigo R: Are splicing mutations the most frequent cause of hereditary disease? FEBS Lett 2005, 579(9):1900–1903. 10.1016/j.febslet.2005.02.047View ArticlePubMedGoogle Scholar
- Stenson PD, Ball E, Howells K, Phillips A, Mort M, Cooper DN: Human Gene Mutation Database: towards a comprehensive central mutation database. J Med Genet 2008, 45(2):124–126. 10.1136/jmg.2007.055210View ArticlePubMedGoogle Scholar
- Buratti E, Chivers M, Kralovicova J, Romano M, Baralle M, Krainer AR, Vorechovsky I: Aberrant 5' splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization. Nucleic Acids Res 2007, 35(13):4250–4263. 10.1093/nar/gkm402View ArticlePubMedPubMed CentralGoogle Scholar
- Vorechovsky I: Aberrant 3' splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization. Nucleic Acids Res 2006, 34(16):4630–4641. 10.1093/nar/gkl535View ArticlePubMedPubMed CentralGoogle Scholar
- Garcia-Blanco MA, Baraniak AP, Lasda EL: Alternative splicing in disease and therapy. Nat Biotechnol 2004, 22(5):535–546. 10.1038/nbt964View ArticlePubMedGoogle Scholar
- Hua Y, Vickers TA, Okunola HL, Bennett CF, Krainer AR: Antisense masking of an hnRNP A1/A2 intronic splicing silencer corrects SMN2 splicing in transgenic mice. Am J Hum Genet 2008, 82(4):834–848. 10.1016/j.ajhg.2008.01.014View ArticlePubMedPubMed CentralGoogle Scholar
- Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB: Alternative isoform regulation in human tissue transcriptomes. Nature 2008, 456(7221):470–476. 10.1038/nature07509View ArticlePubMedPubMed CentralGoogle Scholar
- Burset M, Seledtsov IA, Solovyev VV: Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res 2000, 28(21):4364–4375. 10.1093/nar/28.21.4364View ArticlePubMedPubMed CentralGoogle Scholar
- Thanaraj TA, Clark F: Human GC-AG alternative intron isoforms with weak donor sites show enhanced consensus at acceptor exon positions. Nucleic Acids Res 2001, 29(12):2581–2593. 10.1093/nar/29.12.2581View ArticlePubMedPubMed CentralGoogle Scholar
- Churbanov A, Winters-Hilt S, Koonin EV, Rogozin IB: Accumulation of GC donor splice signals in mammals. Biol Direct 2008, 3: 30. 10.1186/1745-6150-3-30View ArticlePubMedPubMed CentralGoogle Scholar
- Sheth N, Roca X, Hastings ML, Roeder T, Krainer AR, Sachidanandam R: Comprehensive splice-site analysis using comparative genomics. Nucleic Acids Res 2006, 34(14):3955–3967. 10.1093/nar/gkl556View ArticlePubMedPubMed CentralGoogle Scholar
- Sparks ME, Brendel V: Incorporation of splice site probability models for non-canonical introns improves gene structure prediction in plants. Bioinformatics 2005, 21(Suppl 3):iii20–30. 10.1093/bioinformatics/bti1205View ArticlePubMedGoogle Scholar
- Hebsgaard SM, Korning PG, Tolstrup N, Engelbrecht J, Rouze P, Brunak S: Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information. Nucleic Acids Res 1996, 24(17):3439–3452. 10.1093/nar/24.17.3439View ArticlePubMedPubMed CentralGoogle Scholar
- Sun H, Chasin LA: Multiple splicing defects in an intronic false exon. Mol Cell Biol 2000, 20(17):6414–6425. 10.1128/MCB.20.17.6414-6425.2000View ArticlePubMedPubMed CentralGoogle Scholar
- Fickett JW, Tung CS: Assessment of protein coding measures. Nucleic Acids Res 1992, 20(24):6441–6450. 10.1093/nar/20.24.6441View ArticlePubMedPubMed CentralGoogle Scholar
- Burge C, Karlin S: Prediction of complete gene structures in human genomic DNA. J Mol Biol 1997, 268(1):78–94. 10.1006/jmbi.1997.0951View ArticlePubMedGoogle Scholar
- Krogh A: Two methods for improving performance of an HMM and their application for gene finding. Proc Int Conf Intell Syst Mol Biol 1997, 5: 179–186.PubMedGoogle Scholar
- Krogh A: Gene finding: putting the parts together. In Guide to human genome computing. 2nd edition. Edited by: Bishop MJ. Academic Press, San Diego, CA; 1998:261–274. full_textView ArticleGoogle Scholar
- Churbanov A, Rogozin IB, Deogun JS, Ali H: Method of predicting splice sites based on signal interactions. Biol Direct 2006, 1: 10. 10.1186/1745-6150-1-10View ArticlePubMedPubMed CentralGoogle Scholar
- Eden E, Brunak S: Analysis and recognition of 5' UTR intron splice sites in human pre-mRNA. Nucleic Acids Res 2004, 32(3):1131–1142. 10.1093/nar/gkh273View ArticlePubMedPubMed CentralGoogle Scholar
- Rogozin IB, Milanesi L: Analysis of donor splice sites in different eukaryotic organisms. J Mol Evol 1997, 45(1):50–59. 10.1007/PL00006200View ArticlePubMedGoogle Scholar
- Mache N, Levi P: GENIO- A Non-Redundant Eukaryotic Gene Database of Annotated Sites and Sequences. RECOMB 1998.Google Scholar
- Yeo G, Burge CB: Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol 2004, 11(2–3):377–394. 10.1089/1066527041410418View ArticlePubMedGoogle Scholar
- Eng L, Coutinho G, Nahas S, Yeo G, Tanouye R, Babaei M, Dork T, Burge C, Gatti RA: Nonclassical splicing mutations in the coding and noncoding regions of the ATM Gene: maximum entropy estimates of splice junction strengths. Hum Mutat 2004, 23(1):67–76. 10.1002/humu.10295View ArticlePubMedGoogle Scholar
- Wang Z, Rolish ME, Yeo G, Tung V, Mawson M, Burge CB: Systematic identification and analysis of exonic splicing silencers. Cell 2004, 119(6):831–845. 10.1016/j.cell.2004.11.010View ArticlePubMedGoogle Scholar
- Fairbrother WG, Yeh RF, Sharp PA, Burge CB: Predictive identification of exonic splicing enhancers in human genes. Science 2002, 297(5583):1007–1013. 10.1126/science.1073774View ArticlePubMedGoogle Scholar
- Kralovicova J, Vorechovsky I: Position-dependent repression and promotion of DQB1 intron 3 splicing by GGGG motifs. J Immunol 2006, 176(4):2381–2388.View ArticlePubMedGoogle Scholar
- Divina P, Kvitkovicova A, Buratti E, Vorechovsky I: Ab initio prediction of mutation-induced cryptic splice-site activation and exon skipping. Eur J Hum Genet 2009, 17(6):759–765. 10.1038/ejhg.2008.257View ArticlePubMedPubMed CentralGoogle Scholar
- Kralovicova J, Vorechovsky I: Global control of aberrant splice-site activation by auxiliary splicing sequences: evidence for a gradient in exon and intron definition. Nucleic Acids Res 2007, 35(19):6399–6413. 10.1093/nar/gkm680View ArticlePubMedPubMed CentralGoogle Scholar
- Churbanov A: Contemporary progress in gene structure prediction. Current genomics 2006, 7(5):283–292. 10.2174/138920206778604395View ArticleGoogle Scholar
- Yeo G, Hoon S, Venkatesh B, Burge CB: Variation in sequence and organization of splicing regulatory elements in vertebrate genes. Proc Natl Acad Sci USA 2004, 101(44):15700–15705. 10.1073/pnas.0404901101View ArticlePubMedPubMed CentralGoogle Scholar
- Abril JF, Castelo R, Guigo R: Comparison of splice sites in mammals and chicken. Genome Res 2005, 15(1):111–119. 10.1101/gr.3108805View ArticlePubMedPubMed CentralGoogle Scholar
- Fairbrother WG, Yeo GW, Yeh R, Goldstein P, Mawson M, Sharp PA, Burge CB: RESCUE-ESE identifies candidate exonic splicing enhancers in vertebrate exons. Nucleic Acids Res 2004, (32 Web Server):W187–190. 10.1093/nar/gkh393Google Scholar
- Minovitsky S, Gee SL, Schokrpur S, Dubchak I, Conboy JG: The splicing regulatory element, UGCAUG, is phylogenetically and spatially conserved in introns that flank tissue-specific alternative exons. Nucleic Acids Res 2005, 33(2):714–724. 10.1093/nar/gki210View ArticlePubMedPubMed CentralGoogle Scholar
- Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D: Ultraconserved elements in the human genome. Science 2004, 304(5675):1321–1325. 10.1126/science.1098119View ArticlePubMedGoogle Scholar
- Churbanov A, Vorechovsky I, Hicks C: Computational prediction of splicing regulatory elements shared by Tetrapoda organisms. BMC Genomics 2009, 10: 508. 10.1186/1471-2164-10-508View ArticlePubMedPubMed CentralGoogle Scholar
- Berget SM: Exon recognition in vertebrate splicing. J Biol Chem 1995, 270(6):2411–2414.View ArticlePubMedGoogle Scholar
- Rebbeck TR, Ambrosone CB, Bell DA, Chanock SJ, Hayes RB, Kadlubar FF, Thomas DC: SNPs, haplotypes, and cancer: applications in molecular epidemiology. Cancer Epidemiol Biomarkers Prev 2004, 13(5):681–687.PubMedGoogle Scholar
- Zhang C, Li WH, Krainer AR, Zhang MQ: RNA landscape of evolution for optimal exon and intron discrimination. Proc Natl Acad Sci USA 2008, 105(15):5797–5802. 10.1073/pnas.0801692105View ArticlePubMedPubMed CentralGoogle Scholar
- Zhang MQ: Statistical features of human exons and their flanking regions. Hum Mol Genet 1998, 7(5):919–932. 10.1093/hmg/7.5.919View ArticlePubMedGoogle Scholar
- Hwang DY, Cohen JB: U1 small nuclear RNA-promoted exon selection requires a minimal distance between the position of U1 binding and the 3' splice site across the exon. Mol Cell Biol 1997, 17(12):7099–7107.View ArticlePubMedPubMed CentralGoogle Scholar
- Felsenstein J: Inferring phylogenies. Sinauer Associates, Inc, Sunderland, Massachusetts; 2004.Google Scholar
- Zheng ZM, He P, Baker CC: Selection of the bovine papillomavirus type 1 nucleotide 3225 3' splice site is regulated through an exonic splicing enhancer and its juxtaposed exonic splicing suppressor. J Virol 1996, 70(7):4691–4699.PubMedPubMed CentralGoogle Scholar
- Hunter DJ, Kraft P, Jacobs KB, Cox DG, Yeager M, Hankinson SE, Wacholder S, Wang Z, Welch R, Hutchinson A, et al.: A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet 2007, 39(7):870–874. 10.1038/ng2075View ArticlePubMedPubMed CentralGoogle Scholar
- Easton DF, Pooley KA, Dunning AM, Pharoah PD, Thompson D, Ballinger DG, Struewing JP, Morrison J, Field H, Luben R, et al.: Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 2007, 447(7148):1087–1093. 10.1038/nature05887View ArticlePubMedPubMed CentralGoogle Scholar
- Listgarten J, Damaraju S, Poulin B, Cook L, Dufour J, Driga A, Mackey J, Wishart D, Greiner R, Zanke B: Predictive models for breast cancer susceptibility from multiple single nucleotide polymorphisms. Clin Cancer Res 2004, 10(8):2725–2737. 10.1158/1078-0432.CCR-1115-03View ArticlePubMedGoogle Scholar
- Commonly studied single-nucleotide polymorphisms and breast cancer: results from the Breast Cancer Association Consortium J Natl Cancer Inst 2006, 98(19):1382–1396.Google Scholar
- Stacey SN, Manolescu A, Sulem P, Rafnar T, Gudmundsson J, Gudjonsson SA, Masson G, Jakobsdottir M, Thorlacius S, Helgason A, et al.: Common variants on chromosomes 2q35 and 16q12 confer susceptibility to estrogen receptor-positive breast cancer. Nat Genet 2007, 39(7):865–869. 10.1038/ng2064View ArticlePubMedGoogle Scholar
- Benusiglio PR, Lesueur F, Luccarini C, Conroy DM, Shah M, Easton DF, Day NE, Dunning AM, Pharoah PD, Ponder BA: Common ERBB2 polymorphisms and risk of breast cancer in a white British population: a case-control study. Breast Cancer Res 2005, 7(2):R204–209. 10.1186/bcr982View ArticlePubMedPubMed CentralGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215(3):403–410.View ArticlePubMedGoogle Scholar
- Kent WJ: BLAT--the BLAST-like alignment tool. Genome Res 2002, 12(4):656–664.View ArticlePubMedPubMed CentralGoogle Scholar
- Cogan JD, Pauciulo MW, Batchman AP, Prince MA, Robbins IM, Hedges LK, Stanton KC, Wheeler LA, Phillips JA, Loyd JE, et al.: High frequency of BMPR2 exonic deletions/duplications in familial pulmonary arterial hypertension. Am J Respir Crit Care Med 2006, 174(5):590–598. 10.1164/rccm.200602-165OCView ArticlePubMedPubMed CentralGoogle Scholar
- Parkinson GN: Fundamentals of Quadruplex Structures. In Quadruplex Nucleic Acids. Edited by: Neidle S, Balasubramanian S. Cambridge: RSC; 2006:1–30.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.