Gene searching and discovery
As an information and gene discovery system, wFleaBase focuses on providing efficient tools for searching and retrieving records of interest. Its current features are best highlighted by co-navigating the web pages along with a user interested in locating ecologically relevant genes, for example, genes that confer resistance to elevated levels of ultra-violet radiation encountered by closely related species to D. pulex. Beginning at the welcome page, the user can navigate via the hyperlink located at the top menu towards the Blast page of wFleaBase http://wfleabase.org/blast/ to perform sequence-similarity searches on the archived data using the BLAST family of programs. The user enters a nucleotide sequence, whose gene function is well characterized and evolutionarily conserved, with a goal to find the homologous gene in Daphnia. For example, a Drosophila melanogaster mRNA sequence obtained from GenBank (NM 165564) or from FlyBase (FBgn0003082) for the gene photorepair is used to query all Daphnia sequences using the default settings of the tblastx program. Alternatively, the user can select to query species-specific GSS or EST databases. This search retrieves record WFgs0000440, which is a 917 nucleotide sequence with a best match score of 83 bits and an E-value of 5e-45. Using this information, the user can then download the Daphnia sequence onto their personal computers as a text file, design primers using their own software to probe the arrayed Daphnia cosmid library by the Polymerase Chain Reaction (PCR), identify bacterial clones containing the gene, and characterize the entire locus by sequencing. Indeed, this specific exercise identifies at least three cosmids (out of 37,000) containing a likely homologue to photorepair from Drosophila [13].
Returning to the welcome page, the user can instead choose to explore tables containing data extracted from automated BLAST searches against the euGenes database, which includes annotated genome sequences from 10 eukaryotic model organisms. Although this option for gene searching is more tedious, it does allow users to focus precisely on the data currently available in wFleaBase. Four tables of BLAST results are offered at http://wfleabase.org/genomics/ by following the "Genomics" hyperlinks located at the top and side menus. At present, Daphnia EST and GSS sequences are each compared to the protein coding genes and to genomic sequences in euGenes. Many options exist for sorting the BLAST tables. The user can specify what BLAST result columns to show, and can sort these columns based on the ascending or descending order of their entries. The tables can also include BLAST results against all organisms within euGenes or the tables can be filtered to include results from comparisons against a single taxon. For example, the same user, now looking to find a Daphnia homologue to genes known to confer salt-resistance to species inhabiting saline environments, begins by searching for names or euGenes accession numbers of functionally related genes within the Blast tables using the wFleaBase search function located in the top menu of the Blast tables. If the user chooses to search for "ATPα", which is a sodium/potassium-exchanging ATPase shown to be under positive selection in brine shrimp populations adapted to ultra-saline waters [14], 11 EST records that match ATPα in fly are discovered with bit scores and E-values ranging from 42.36 and 0.002 to 327.0 and 2.2e-89. The user can retrieve the Daphnia sequences via hyperlinks located in the first column of the search results, or further uncover the extent of evolutionary conservation for this gene by examining the euGene Reports, also via hyperlinks located in the last column. Alternatively, if the user chooses to use the FlyBase accession number for this gene (FBgn0002921) to retrieve Daphnia homologues using the search function, the same 11 records are obtained.
Tools for hunting unknown genes
Although effective, the candidate gene approach to finding Daphnia genes of ecological interest is limited by the levels of sequence and functional conservation among characterized genes in other model organisms. Work is underway by the DGC to create the required tools for identifying ecologically relevant genes by positional mapping using microsatellite markers. wFleaBase presently archives 528 microsatellite markers [15]. Yet, to generate additional loci for genetic mapping in D. pulex and D. magna, wFleaBase integrates a suite of computational programs that (i) identifies microsatellites from raw DNA sequencer trace files, (ii) designs optimal primers for amplifying the markers and (iii) indexes the amplicon, microsatellite motifs and primer information into the Microsat database [16]. The Microsat database will rapidly grow by applying this pipeline to trace files emerging from the Daphnia genome sequencing project.
The wFleaBase search function
wFleaBase uses LuceGene to support rapid search and retrieval of the sequence database, of Blast table entries, of Daphnia Medline references and of Daphnia web documents. LuceGene [17], based on the Lucene [18] search system, is an open-source part of the GMOD project. A major benefit of LuceGene is the large variety of data formats that can be added to the search system with minimal work. For instance, currently supported formats used in wFleaBase include Simple text, XML (Medline abstracts and Gene sequence annotation), HTML, Tabular data, Bio-formats (Fasta, GenBank, EMBL) and Gene object data used by euGenes. Search terms such as "magna" to retrieve all sequences from this species, can be entered in a Search box at the head of all web pages. The search is refined at the main wFleaBase search page by specifying the search library (sequences, references, documents or Blast tables) and the library fields containing the queried term. Options are also available to detail the output format, and each result is hyperlinked to the source document for easy access to the data. On a separate web page (Batch download), users can recover multiple records obtained from complex queries and save the results to a file.