Djinn Lite: a tool for customised gene transcript modelling, annotation-data enrichment and exploration
© Teber et al; licensee BioMed Central Ltd. 2006
Received: 10 August 2005
Accepted: 23 January 2006
Published: 23 January 2006
There is an ever increasing rate of data made available on genetic variation, transcriptomes and proteomes. Similarly, a growing variety of bioinformatic programs are becoming available from many diverse sources, designed to identify a myriad of sequence patterns considered to have potential biological importance within inter-genic regions, genes, transcripts, and proteins. However, biologists require easy to use, uncomplicated tools to integrate this information, visualise and print gene annotations. Integrating this information usually requires considerable informatics skills, and comprehensive knowledge of the data format to make full use of this information. Tools are needed to explore gene model variants by allowing users the ability to create alternative transcript models using novel combinations of exons not necessarily represented in current database deposits of mRNA/cDNA sequences.
Djinn Lite is designed to be an intuitive program for storing and visually exploring of custom annotations relating to a eukaryotic gene sequence and its modelled gene products. In particular, it is helpful in developing hypothesis regarding alternate splicing of transcripts by allowing the construction of model transcripts and inspection of their resulting translations. It facilitates the ability to view a gene and its gene products in one synchronised graphical view, allowing one to drill down into sequence related data. Colour highlighting of selected sequences and added annotations further supports exploration, visualisation of sequence regions and motifs known or predicted to be biologically significant.
Gene annotating remains an ongoing and challengingtask that will continue as gene structures, gene transcription repertoires, disease loci, protein products and their interactions become moreprecisely defined. Djinn Lite offers an accessible interface to help accumulate, enrich, and individualise sequence annotations relating to a gene, its transcripts and translations. The mechanism of transcript definition and creation, and subsequent navigation and exploration of features, are very intuitive and demand only a short learning curve. Ultimately, Djinn Lite can form the basis for providing valuable clues to plan new experiments, providing storage of sequences and annotations for dedication to customised projects. The application is appropriate for Windows 98-ME-2000-XP-2003 operating systems.
There remains intense interest in the annotation of genomes, and work is continuing to be done to elucidate all of the human genes, its haplotypes [1, 2] and its transcriptome [3, 4]. It is estimated approximately 50% of the human transcriptome is not yet annotated . According to the CCDS project, as at March, 2005, just over 13,000 genes can be reliably said to code for protein , which represents approximately half of the human genes . Inconsistencies in gene annotations arise due to different human genome sequence database centers employing different methods for identifying the locations of genes and generating coding sequences , specifically from different computational methods and gene-finding programs. "Ab initio" gene finding programs detect genes by looking for distinct patterns that define where a gene begins and ends. Alternatively, comparative gene finding programs look for genes by comparing segments of sequence with those of known genes and proteins . Current gene prediction algorithms also focus largely onpredicting coding regions and less on untranslated regions [7, 8]. Gene finding computational methods alone are simply insufficient to generate accurate gene structures. Providing accurate gene annotations will take the coordinated efforts of experimentalists and computational biologists to learn from the inconsistencies between gene structures generated by manual curation and automated methods .
Also, limited numbers of tissue-specific EST and mRNA sequences deposited into public databases, as well as different cDNA construction protocols may miss tissue-specific transcript constructs . Again, biological experiments may be necessary to confirm any transcript constructions, but programs which can suggest best-guesses and ranges of transcript options could be extremely valuable.
The need for manual gene curating is also necessitated due to errors and inaccuracies that may exist in the main sequence databases [11, 12]. It is not uncommon for different sources to give different names to the same regions (e.g. Exons), particularly between literature and genome database sources (e.g. PTGS1/COX1 gene) [13–15]. These situations make manual curating all the more important, as biologists attempt to resolve inconsistencies.
In the past 30 years, genetic studies of multifactorial human diseases have identified approximately 50 genes and their allelic variants. However, it is estimated that there are likely to be hundreds of susceptibility loci that increase the risk for each common disease . Therefore, annotating of genes remains ongoing and represents a challenging task that will be driven forward continually as the human gene structures and disease loci become more precisely defined. Current efforts are underway to develop a haplotype map of the human genome [1, 2]. The haplotype map or "HapMap" aims to provide researchers with information to find genes and genetic variations that affect health and disease. Manual annotators will play a critical role cataloguing how different components interact and contribute to biological processes, diseases and physiological complexity. Thus, investigators require tools to be able to store, test and analyse the combinations of alleles experimentally observed to be inherited as units from DNA/polymorphism screenings.
It is estimated that 75% of alternative splicing events change the protein coding sequence [17, 18]. It is thought that approximately a third to half of all human genes produce multiple transcript variants . Alternative splicing can often produce protein isoforms with different domain compositions and motifs . It will be vital to use tools to be able to model variant transcripts to test for splice-site plasticity and disease forming missplicing events .
A sample of sources available on experimentally, computationally derived gene annotations and sequence pattern discovery programs
AltSplice and AltExtron – Computationally derived splicing events.
AEdb – splicing events manually generated from the literature.
Polymorphic sequence variations
Predicting RNA secondary structures
Predicting protein motif/domains/structural similarities
Human gene mutations
Predicting transcription factor binding sites
Predicting exonic splicing enhancer sites
Predicting promoter elements
SIGNAL SCAN http://thr.cit.nih.gov/molbio/signal/
Predicting post-translational modifications
NetAcet – N-terminal acetylation in eukaryotic proteins.
NetNGlyc – N-linked glycosylation sites in human proteins.
NetOGlyc – O-GalNAc (mucin type) glycosylation sites in mammalian proteins.
NetPhos – Serine, threonine and tyrosine phosphorylation sites in eukaryotic proteins.
ProP – Arginine and lysine propeptide cleavage sites in eukaryotic protein sequences.
YinOYang – O-(beta)-GlcNAc glycosylation and Yin-Yang sites (intracellular/nuclear proteins).
All above web server programs can be found at http://www.cbs.dtu.dk/services/
Protein-protein interaction data
A number of sophisticated and powerful sequence annotation and visualisation tools are available including ARTEMIS , SeqVISTA , and Genotator [24, 25]. These tools principally focus on features that are related to segments of nucleotide sequences or small genomes, translated amino acid sequences and their annotations. Many of the features that are necessary for visualising sequences such as ease of navigation, colour coding, and dynamic linking of macro level depictions with detailed sequences exist in these programs. They also contain additional integrated functionality which can be useful to experienced bioinformaticians, including exon predicting, dbEST searches, protein secondary structure predictions and others [22–24].
Genome browsers on the other hand, are suited to large scale annotation and analysis of genomes and include UCSC's Genome Browser , ENSEMBL project viewer , NCBI Map Viewer and GeneViTo . UCSC's Genome Browser can display requested portions of genomes by zooming in/out to any scale, together with many aligned annotation tracks, including known genes, predicted genes, ESTs, mRNAs, CpG islands, assembly gaps and coverage, chromosomal bands, cross-species homologies, and tracks that have been deposited by others. Users can add and view their own custom tracks, however, this can require users to place annotations in formatted files before uploading into the browser.
NCBI Human Map viewer also has an additional function called Model Maker which is able to show the exons provided by GenBank, mRNAs, ESTs, and gene predictions. However, the numbering and alignments between transcript and genomic sequence, between transcript and protein are non-intuitive, requiring jumping between screens to obtain position number associations. Another key limitation of web-based visualisation and annotation tools is the available printing options. These are limited to printing only what is available on the page or images, and can make copying/pasting sequences and alignments cumbersome.
The number and type of annotations can vary and arranging annotations in a non-confusing manner is paramount for non-bioinformatic conversant biologists. Ultimately, excessive functionality, formatting of input files, genome wide analysis, and inflexible printing can be overwhelming for biologists whose key focus would be to judiciously conduct wet laboratory experiments on their gene of interest. Thus, unneeded complexities related to gene annotations need to be hidden from view, and software tools need to be less complicated in an effort to help in integrating, storing and visualising annotations as biologists gradually learn more about their gene of interest.
We have developed Djinn Lite for those users not requiring significant bioinformatics skills in customising gene/transcript annotations. The application is appropriate for Windows 98-ME-2000-XP-2003 operating systems.
Input sequence, defining transcripts and coding regions
Djinn Lite invokes a wizard to allow for the input of a raw nucleotide sequence. The raw nucleotide sequence can be genomic, pre-mRNA, mRNA or partial or complete protein coding region (CDS) sequences. For example, raw nucleotide sequences can be obtained from any sequence database including NCBI Entrez Gene , Ensembl , GeneCards  and Celera  and then cut-and-paste into Djinn Lite's main sequence input text form as part of the initial wizard.
Djinn Lite uses the term "Nucleotide Regions" to describe core transcriptional regions, such as exons, 5' and 3' untranslated regions. Upon the input of a main sequence key regions may be defined by either providing a start and end nucleotide position or selecting the nucleotides using the mouse by click-and-dragging. Alternatively, regions may be assigned after carrying out a nucleotide string search match. Textual information describing the source of the information may be added to a "Reference" entry field pertaining to a defined "Nucleotide Region".
Transcripts can be generated by selecting from a list of previously defined transcriptional regions referred to as "Nucleotide Regions". When an mRNA transcript is created, a check box can be used to avoid translating of flanking regions such as 5' and 3' UTRs. The checkbox by default remains checked to allow the entire construct to be translated as in the case of a coding sequence (CDS).
Feature annotation and colour highlighting
Annotating and colour highlighting of particular nucleotide or translated protein sequence can be carried out by either conducting a sequence search or by providing a sequence start and end position. Adding annotations using the "Highlight Nucleotides" enables users to carry out searches against a list of previously defined transcripts, CDSs, including the main input sequence (e.g. genomic sequence). There are 15 colours to choose from for the highlighting of sequence regions and macro level depictions (a bar that displays colour annotations over graphical box representations of the gene, transcripts, and proteins). Thus, colour highlights can be overlaid on top of sequence regions which can include; 5' transcriptional control elements, promoters, translational control elements, start and stop codons, 3' polyadenylation signals, binding sites for transcription factors, splicing enhancer/silencing elements, polymorphic variants, SNPs, mutations, microRNA and small interfering RNA (siRNA) targets, RNA editing sequences, protein domains, motifs and protein binding regions, PCR covered regions, putative regions or regions requiring experiment validation (Figure 1 and Table 1). Users have the discretion and flexibility to make any annotation of their choosing. Also, grey colour highlights are automatically generated by Djinn Lite where there are overlapping colour coded annotations along a sequence. Each annotation also enables the attachment of a textual description which appears in the legends in both the "Sequence Reports" and "Graphical View" pages.
Graphical representation and viewing sequences
Djinn Lite uses multiple rows or tracks  for handling the complexity of genomic sequence annotations enabling numerous annotations and incorporates a multiple dimensional data space (sequence, transcript regions, features). The "View Sequences" tab is the "working display" window showing the sequence(s) (base pairs) in their entirety viewable by horizontal scrolling bar.
The user can toggle between the macro view identified as "Graphical Overview" and the "View Sequence". The "Graphical Overview" provides the user with a high level picture representation of the gene, transcripts and protein.
The "Graphical Overview" was designed to be particularly useful for gaining an overview of the physical size of the gene map and its associated transcripts, in terms of the relative sizes of the introns, exons, and the density of features along these maps. The sizes of the transcripts and their associated proteins are scaled relative to each other. Thus, an inspection of key global differences between transcripts can provide clues to dissimilarities in transcriptional regions and protein domains or motifs.
Within the "Graphical Overview" there are two sections, the genomic view and the transcripts view. Within the genomic view boxes represent exons, narrow lines represent introns or non-genic regions, and below is an annotation ruler designed to display colour bars to assist in featured annotations alongside their corresponding relative locations along the genomic map. Djinn Lite is also able to depict "overlapping exons" and "overlapping untranslated regions", as can occur due to the plasticity of splice-site selection , as dark green coloured boxes. The transcripts view depicts all of the defined transcripts, where boxes represent untranslated, exons or coding regions. An annotation ruler displays colour bars to feature annotations alongside their corresponding relative location along the transcript. Also, the translated transcript is displayed as an outlined box and can be overlaid with colour code bars to correspond to annotations relative to the protein.
The user is able to maintain context between the "Graphical Overview" and "View Sequences", as the "View Sequences" tab, either displays transcript regions in context to its genomic DNA or a protein in context to its transcript. This allows the user to view sequence alignments and numbering of nucleotide and amino acid positions in an integrated context.
Previewing/printing/exporting sequences and picture representations
The second option, the "Sequence Reports" tab, enables printing of sequences alongside aligned regions, amino acid sequences, with colour-codes overlaid on the sequences. This results in a table representation of the sequences, including a legend for the annotations at the end of the report. Printing is context specific, i.e. transcript regions are aligned in context to its genomic DNA or optionally a protein is aligned in context to its transcript. The width of the table can be modified, thus allowing users to have a large range of sequence lengths to aid their viewing requirements. It is then possible to copy/paste, which provides for report writing and ultimately for publication purposes.
Djinn Lite also allows for exporting of gene, transcript, CDS, transcriptional nucleotide regions (exons and untranslated regions) and protein sequences in FASTA format. All accumulated sequences, transcript models and annotations can be saved as a text formatted file. This text file can be easily copied into Microsoft Excel for further manipulation or analysis.
Software design limitations
Once a main sequence entry has been initially processed in Djinn Lite the nucleotide sequence state remains static and subsequent nucleotide changes (addition/deletions/substitutions) within a saved gene configuration are not allowed. Thus, Djinn Lite is not designed for automatic updating of downstream gene product sequences when alterations are made to the inputted nucleotide sequence. Real-time changing of nucleotides at the main sequence level would be a useful feature in helping to observe the effects of nucleotide changes at downstream levels, including changes to splicing, domains/motifs, and amino acid changes. This would force the program to respond to a multitude of subsequent effects, including changes to regions and transcript variants. This feature was not implemented in real time to prevent the user interface from becoming too complex, as it could potentially yield multiple user notifications to highlight many of the subsequent downstream sequence alterations. However, a separate Djinn session and file can be set up to accommodate for different sequence variations of a gene or sequence entry. For example, Djinn Lite can be used for haplotype mapping, where each distinct haplotype (distinct set of polymorphic variations which are inherited as a unit) can be set up as a separate Djinn file.
Again, in an effort to maintain the simplicity of the Djinn Lite's user interface, some features which are biologically relevant to mRNA were not implemented. We believed that these features were not critical to the main emphasis and utility of Djinn Lite, which was the ease of use, uncomplicated transcript modeling, annotating and visualising. For example, thymine (T) is not replaced for uracil (U) when DNA is transformed to RNA. This aspect may be more important for programs that provide RNA secondary structural analysis, as uracil and thymine have different hybridization properties. This was not part of Djinn Lite's design scope. Djinn Lite avoids accommodating for addition of multiple adenosine nucleotides onto the 3' end of defined transcripts (polyadenylation) for subsequently defined and transformed downstream sequences. Likewise, RNA editing such as substitution or deletion or insertion editing are avoided. Also, Djinn Lite does not permit loose sequence alignments.
Djinn Lite represents a tool for the process of "annotation data enrichment", which involves the incremental gathering, qualifying and experimental verification of both putative and documented gene sequence annotations. Djinn Lite provides the ability to display annotations, easy to follow numbering of aligned sequences, creation of alternative transcript models using novel combinations of exons, as well as offer flexible printing options for annotated sequences and gene/transcript maps. The interface is intuitive, requiring only a short learning curve, helping to quickly accumulate and individualise sequence information on genes and their flow on products, including sequence annotations relating to transcriptional/translational regulation, post translational modifications and protein interactions. Djinn Lite can provide storage of gene annotations for personalised projects on particular genes of interest and therefore be the basis of valuable clues to plan new experiments so that the needs of biologists whose key concern is to judiciously plan and conduct experiments are met. Ultimately, extensive use of such a tool can help to improve the accuracy and comprehensiveness of genome wide annotations. Additionally, Djinn Lite may be a useful teaching aid to support the learning of undergraduate students on topics related to gene structure.
Availability and requirements
Project name: Customised gene transcript modeling, annotating and exploring
Project name home page: http://www.sbio.pharm.usyd.edu.au/DjinnLite
Operating system: Microsoft 98/ME/2000/XP/2003
Programming language: Visual Basic Version 6
Other requirements: None
Licenses: Executable is freeware
Any restrictions to use by non-academics: None
We are grateful to Tim Peters, Alex Shaw, Justine Lau and Rainsy Tang for their contributions.
- Foster MW, Sharp RR: Beyond race: towards a whole-genome perspective on human populations and genetic variation. Nat Rev Genet 2004, 5: 790–796. 10.1038/nrg1452View ArticlePubMedGoogle Scholar
- The International HapMap Project Nature 2003, 426: 789–796. 10.1038/nature02168
- Kampa D, Cheng J, Kapranov P, Yamanaka M, Brubaker S, Cawley S, Drenkow J, Piccolboni A, Bekiranov S, Helt G, Tammana H, Gingeras TR: Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res 2004, 14: 331–342. 10.1101/gr.2094104PubMed CentralView ArticlePubMedGoogle Scholar
- Yeo G, Holste D, Kreiman G, Burge CB: Variation in alternative splicing across human tissues. Genome Biol 2004, 5: R74. 10.1186/gb-2004-5-10-r74PubMed CentralView ArticlePubMedGoogle Scholar
- Consensus CDS project - European Bioinformatics Institute (EBI), National Center for Biotechnology Information (NCBI), the Wellcome Trust Sanger Institute (WTSI), and the University of California, Santa Cruz (UCSC).2005. [http://www.ncbi.nlm.nih.gov/projects/CCDS]
- Finishing the euchromatic sequence of the human genome Nature 2004, 431: 931–945. 10.1038/nature03001
- Ashurst JL, Collins JE: Gene annotation: prediction and testing. Annu Rev Genomics Hum Genet 2003, 4: 69–88. 10.1146/annurev.genom.4.070802.110300View ArticlePubMedGoogle Scholar
- Dike S, Balija VS, Nascimento LU, Xuan Z, Ou J, Zutavern T, Palmer LE, Hannon G, Zhang MQ, McCombie WR: The mouse genome: experimental examination of gene predictions and transcriptional start sites. Genome Res 2004, 14: 2424–2429. 10.1101/gr.3158304PubMed CentralView ArticlePubMedGoogle Scholar
- Pennisi E: Bioinformatics. Gene counters struggle to get the right answer. Science 2003, 301: 1040–1041. 10.1126/science.301.5636.1040View ArticlePubMedGoogle Scholar
- Gupta S, Zink D, Korn B, Vingron M, Haas SA: Strengths and weaknesses of EST-based prediction of tissue-specific alternative splicing. BMC Genomics 2004, 5: 72. 10.1186/1471-2164-5-72PubMed CentralView ArticlePubMedGoogle Scholar
- Wesche PL, Gaffney DJ, Keightley PD: DNA sequence error rates in Genbank records estimated using the mouse genome as a reference. DNA Seq 2004, 15: 362–364.View ArticlePubMedGoogle Scholar
- Karlin S, Bergman A, Gentles AJ: Genomics. Annotation of the Drosophila genome. Nature 2001, 411: 259–260. 10.1038/35077152View ArticlePubMedGoogle Scholar
- Hillarp A, Palmqvist B, Lethagen S, Villoutreix BO, Mattiasson I: Mutations within the cyclooxygenase-1 gene in aspirin non-responders with recurrence of stroke. Thrombosis Research 2003, 112: 275–283. 10.1016/j.thromres.2003.12.005View ArticlePubMedGoogle Scholar
- Maglott D, Ostell J, Pruitt KD, Tatusova T: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 2005, 33: D54–8. 10.1093/nar/gki031PubMed CentralView ArticlePubMedGoogle Scholar
- Hubbard T, Andrews D, Caccamo M, Cameron G, Chen Y, Clamp M, Clarke L, Coates G, Cox T, Cunningham F, Curwen V, Cutts T, Down T, Durbin R, Fernandez-Suarez XM, Gilbert J, Hammond M, Herrero J, Hotz H, Howe K, Iyer V, Jekosch K, Kahari A, Kasprzyk A, Keefe D, Keenan S, Kokocinsci F, London D, Longden I, McVicker G, Melsopp C, Meidl P, Potter S, Proctor G, Rae M, Rios D, Schuster M, Searle S, Severin J, Slater G, Smedley D, Smith J, Spooner W, Stabenau A, Stalker J, Storey R, Trevanion S, Ureta-Vidal A, Vogel J, White S, Woodwark C, Birney E: Ensembl 2005. Nucleic Acids Res 2005, 33: D447–53. 10.1093/nar/gki138PubMed CentralView ArticlePubMedGoogle Scholar
- Wang WY, Barratt BJ, Clayton DG, Todd JA: Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 2005, 6: 109–118. 10.1038/nrg1522View ArticlePubMedGoogle Scholar
- Zavolan M, Kondo S, Schonbach C, Adachi J, Hume DA, Hayashizaki Y, Gaasterland T: Impact of alternative initiation, splicing, and termination on the diversity of the mRNA transcripts encoded by the mouse transcriptome. Genome Res 2003, 13: 1290–1300. 10.1101/gr.1017303PubMed CentralView ArticlePubMedGoogle Scholar
- Okazaki Y, Furuno M, Kasukawa T, Adachi J, Bono H, Kondo S, Nikaido I, Osato N, Saito R, Suzuki H, Yamanaka I, Kiyosawa H, Yagi K, Tomaru Y, Hasegawa Y, Nogami A, Schonbach C, Gojobori T, Baldarelli R, Hill DP, Bult C, Hume DA, Quackenbush J, Schriml LM, Kanapin A, Matsuda H, Batalov S, Beisel KW, Blake JA, Bradt D, Brusic V, Chothia C, Corbani LE, Cousins S, Dalla E, Dragani TA, Fletcher CF, Forrest A, Frazer KS, Gaasterland T, Gariboldi M, Gissi C, Godzik A, Gough J, Grimmond S, Gustincich S, Hirokawa N, Jackson IJ, Jarvis ED, Kanai A, Kawaji H, Kawasawa Y, Kedzierski RM, King BL, Konagaya A, Kurochkin IV, Lee Y, Lenhard B, Lyons PA, Maglott DR, Maltais L, Marchionni L, McKenzie L, Miki H, Nagashima T, Numata K, Okido T, Pavan WJ, Pertea G, Pesole G, Petrovsky N, Pillai R, Pontius JU, Qi D, Ramachandran S, Ravasi T, Reed JC, Reed DJ, Reid J, Ring BZ, Ringwald M, Sandelin A, Schneider C, Semple CA, Setou M, Shimada K, Sultana R, Takenaka Y, Taylor MS, Teasdale RD, Tomita M, Verardo R, Wagner L, Wahlestedt C, Wang Y, Watanabe Y, Wells C, Wilming LG, Wynshaw-Boris A, Yanagisawa M, Yang I, Yang L, Yuan Z, Zavolan M, Zhu Y, Zimmer A, Carninci P, Hayatsu N, Hirozane-Kishikawa T, Konno H, Nakamura M, Sakazume N, Sato K, Shiraki T, Waki K, Kawai J, Aizawa K, Arakawa T, Fukuda S, Hara A, Hashizume W, Imotani K, Ishii Y, Itoh M, Kagawa I, Miyazaki A, Sakai K, Sasaki D, Shibata K, Shinagawa A, Yasunishi A, Yoshino M, Waterston R, Lander ES, Rogers J, Birney E, Hayashizaki Y: Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 2002, 420: 563–573. 10.1038/nature01266View ArticlePubMedGoogle Scholar
- Lander ES, Linton LM, Birren B, C. N, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L: Initial sequencing and analysis of the human genome. Nature 2001, 409: 860–921. 10.1038/35057062View ArticlePubMedGoogle Scholar
- Loraine AE, Helt GA: Visualizing the genome: techniques for presenting human genome data and annotations. BMC Bioinformatics 2002, 3: 19–26. 10.1186/1471-2105-3-19PubMed CentralView ArticlePubMedGoogle Scholar
- Stamm S, Ben-Ari S, Rafalska I, Tang Y, Zhang Z, Toiber D, Thanaraj TA, Soreq H: Function of alternative splicing. Gene 2005, 344: 1–20. 10.1016/j.gene.2004.10.022View ArticlePubMedGoogle Scholar
- Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B: Artemis: sequence visualization and annotation. Bioinformatics 2000, 16: 944–945. 10.1093/bioinformatics/16.10.944View ArticlePubMedGoogle Scholar
- Hu Z, Frith M, Niu T, Weng Z: SeqVISTA: a graphical tool for sequence feature visualization and comparison. BMC Bioinformatics 2003, 4: 1. 10.1186/1471-2105-4-1PubMed CentralView ArticlePubMedGoogle Scholar
- Harris NL: Annotating sequence data using Genotator. Mol Biotechnol 2000, 16: 221–232. 10.1385/MB:16:3:221View ArticlePubMedGoogle Scholar
- Harris NL: Genotator: a workbench for sequence annotation. Genome Res 1997, 7: 754–762.PubMed CentralPubMedGoogle Scholar
- Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, Roskin KM, Schwartz M, Sugnet CW, Thomas DJ, Weber RJ, Haussler D, Kent WJ: The UCSC Genome Browser Database. Nucl Acids Res 2003, 31: 51–54. 10.1093/nar/gkg129PubMed CentralView ArticlePubMedGoogle Scholar
- Birney E, Andrews D, Bevan P, Caccamo M, Cameron G, Chen Y, Clarke L, Coates G, Cox T, Cuff J, Curwen V, Cutts T, Down T, Durbin R, Eyras E, Fernandez-Suarez XM, Gane P, Gibbins B, Gilbert J, Hammond M, Hotz H, Iyer V, Kahari A, Jekosch K, Kasprzyk A, Keefe D, Keenan S, Lehvaslaiho H, McVicker G, Melsopp C, Meidl P, Mongin E, Pettett R, Potter S, Proctor G, Rae M, Searle S, Slater G, Smedley D, Smith J, Spooner W, Stabenau A, Stalker J, Storey R, Ureta-Vidal A, Woodwark C, Clamp M, Hubbard T: Ensembl 2004. Nucleic Acids Res 2004, 32: D468–70. 10.1093/nar/gkh038PubMed CentralView ArticlePubMedGoogle Scholar
- Vernikos GS, Gkogkas CG, Promponas VJ, Hamodrakas SJ: GeneViTo: visualizing gene-product functional and structural features in genomic datasets. BMC Bioinformatics 2003, 4: 53. 10.1186/1471-2105-4-53PubMed CentralView ArticlePubMedGoogle Scholar
- Safran M, Solomon I, Shmueli O, Lapidot M, Shen-Orr S, Adato A, Ben-Dor U, Esterman N, Rosen N, Peter I, Olender T, Chalifa-Caspi V, Lancet D: GeneCards 2002: towards a complete, object-oriented, human gene compendium. Bioinformatics 2002, 18: 1542–1543. 10.1093/bioinformatics/18.11.1542View ArticlePubMedGoogle Scholar
- Kerlavage A, Bonazzi V, di Tommaso M, Lawrence C, Li P, Mayberry F, Mural R, Nodell M, Yandell M, Zhang J, Thomas P: The Celera Discovery System. Nucleic Acids Res 2002, 30: 129–136. 10.1093/nar/30.1.129PubMed CentralView ArticlePubMedGoogle Scholar
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The Human Genome Browser at UCSC. Genome Res 2002, 12: 996–1006. 10.1101/gr.229102. Article published online before print in May 2002PubMed CentralView ArticlePubMedGoogle Scholar
- Thanaraj TA, Stamm S, Clark F, Riethoven JJ, Le Texier V, Muilu J: ASD: the Alternative Splicing Database. Nucleic Acids Res 2004, 32: D64–9. 10.1093/nar/gkh030PubMed CentralView ArticlePubMedGoogle Scholar
- Mignone F, Gissi C, Liuni S, Pesole G: Untranslated regions of mRNAs. Genome Biol 2002, 3: REVIEWS0004. 10.1186/gb-2002-3-3-reviews0004PubMed CentralView ArticlePubMedGoogle Scholar
- Zuker M: Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 2003, 31: 3406–3415. 10.1093/nar/gkg595PubMed CentralView ArticlePubMedGoogle Scholar
- Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS: The Universal Protein Resource (UniProt). Nucleic Acids Res 2005, 33: D154–9. 10.1093/nar/gki070PubMed CentralView ArticlePubMedGoogle Scholar
- Wu C, Nebert DW: Update on genome completion and annotations: Protein Information Resource. Hum Genomics 2004, 1: 229–233.PubMed CentralView ArticlePubMedGoogle Scholar
- Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, Pilbout S, Schneider M: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 2003, 31: 365–370. 10.1093/nar/gkg095PubMed CentralView ArticlePubMedGoogle Scholar
- Smigielski EM, Sirotkin K, Ward M, Sherry ST: dbSNP: a database of single nucleotide polymorphisms. Nucl Acids Res 2000, 28: 352–355. 10.1093/nar/28.1.352PubMed CentralView ArticlePubMedGoogle Scholar
- Letunic I, Copley RR, Schmidt S, Ciccarelli FD, Doerks T, Schultz J, Ponting CP, Bork P: SMART 4.0: towards genomic data integration. Nucleic Acids Res 2004, 32: D142–4. 10.1093/nar/gkh088PubMed CentralView ArticlePubMedGoogle Scholar
- Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 2005, 33: D514–7. 10.1093/nar/gki033PubMed CentralView ArticlePubMedGoogle Scholar
- Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NS, Abeysinghe S, Krawczak M, Cooper DN: Human Gene Mutation Database (HGMD): 2003 update. Hum Mutat 2003, 21: 577–581. 10.1002/humu.10212View ArticlePubMedGoogle Scholar
- Marinescu VD, Kohane IS, Riva A: MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes. BMC Bioinformatics 2005, 6: 79. 10.1186/1471-2105-6-79PubMed CentralView ArticlePubMedGoogle Scholar
- Cartegni L, Wang J, Zhu Z, Zhang MQ, Krainer AR: ESEfinder: a web resource to identify exonic splicing enhancers. Nucl Acids Res 2003, 31: 3568–3571. 10.1093/nar/gkg616PubMed CentralView ArticlePubMedGoogle Scholar
- Fairbrother WG, Yeh RF, Sharp PA, Burge CB: Predictive identification of exonic splicing enhancers in human genes. Science 2002, 297: 1007–1013. 10.1126/science.1073774View ArticlePubMedGoogle Scholar
- Prestridge DS: Predicting Pol II promoter sequences using transcription factor binding sites. J Mol Biol 1995, 5: 923–932. 10.1006/jmbi.1995.0349View ArticleGoogle Scholar
- Kiemer L, Bendtsen JD, Blom N: NetAcet: prediction of N-terminal acetylation sites. Bioinformatics 2005, 21: 1269–1270. 10.1093/bioinformatics/bti130View ArticlePubMedGoogle Scholar
- Julenius K, Molgaard A, Gupta R, Brunak S: Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. Glycobiology 2005, 15: 153–164. 10.1093/glycob/cwh151View ArticlePubMedGoogle Scholar
- Blom N, Gammeltoft S, Brunak S: Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol 1999, 294: 1351–1362. 10.1006/jmbi.1999.3310View ArticlePubMedGoogle Scholar
- Duckert P, Brunak S, Blom N: Prediction of proprotein convertase cleavage sites. Protein Eng Des Sel 2004, 17: 107–112. 10.1093/protein/gzh013View ArticlePubMedGoogle Scholar
- Ng SK, Zhang Z, Tan SH, Lin K: InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes. Nucleic Acids Res 2003, 31: 251–254. 10.1093/nar/gkg079PubMed CentralView ArticlePubMedGoogle Scholar
- Alfarano C, Andrade CE, Anthony K, Bahroos N, Bajec M, Bantoft K, Betel D, Bobechko B, Boutilier K, Burgess E, Buzadzija K, Cavero R, D'Abreo C, Donaldson I, Dorairajoo D, Dumontier MJ, Dumontier MR, Earles V, Farrall R, Feldman H, Garderman E, Gong Y, Gonzaga R, Grytsan V, Gryz E, Gu V, Haldorsen E, Halupa A, Haw R, Hrvojic A, Hurrell L, Isserlin R, Jack F, Juma F, Khan A, Kon T, Konopinsky S, Le V, Lee E, Ling S, Magidin M, Moniakis J, Montojo J, Moore S, Muskat B, Ng I, Paraiso JP, Parker B, Pintilie G, Pirone R, Salama JJ, Sgro S, Shan T, Shu Y, Siew J, Skinner D, Snyder K, Stasiuk R, Strumpf D, Tuekam B, Tao S, Wang Z, White M, Willis R, Wolting C, Wong S, Wrong A, Xin C, Yao R, Yates B, Zhang S, Zheng K, Pawson T, Ouellette BF, Hogue CW: The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res 2005, 33: D418–24. 10.1093/nar/gki051PubMed CentralView ArticlePubMedGoogle Scholar
- Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The Database of Interacting Proteins: 2004 update. Nucleic Acids Res 2004, 32: D449–51. 10.1093/nar/gkh086PubMed CentralView ArticlePubMedGoogle Scholar
- Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M, Cesareni G: MINT: a Molecular INTeraction database. FEBS Letters 2002, 513: 135–140. 10.1016/S0014-5793(01)03293-8View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.